Developing CloudMade’s Adaptive
Framework: Technical Challenges
How the CloudMade team was inspired to create an industry-leading cloud and SDK product for collecting and analysing automotive data sets, and the difficulties that were solved along the way.
Nazar Sheremeta, Solutions Architect at CloudMade
CloudMade’s Adaptive Framework was initially developed when we had multiple teams that were working on multiple projects independently from each other.
We started to find the same problems reoccurring when trying to collaborate on a single product with multiple features. We realized that most of our customers also experience this challenge.
The main hurdles were:
- Each product worked with its own data source and data format
- Every product worked with its own machine learning pipeline without re-using existing components
- Each pipeline had its own signal requirements, and the way machine learning models were distributed
was also different
Combining the products together was going to be a tall order and so a framework needed to be put in place that would help with managing multiple machine learning solutions.
The initial version of the Adaptive Framework included data ingestion, feature engineering, machine learning pipelines and APIs, and much more. Since then, we have added many more features to the Framework and have spent several years building and perfecting it, anticipating the problems that were not always visible.
Before work on the framework began, we already had multiple automotive projects that needed to fit under a single framework umbrella. This meant figuring out what they had in common, what could be reused, and how this could be hosted under a single environment and data lake.
As a result of this, project requirements were conflicting with each other and we had to spend time re-designing some components from scratch, to fit multiple products together. Solutions that were made for a single product, were no longer working for multiple products.
Collecting and handling data
CloudMade provides data collection capabilities across different edge devices: vehicles, mobile phones, etc. We have implemented a mobile data collection algorithm that considers user activity and movement for optimal data collection. This is because users should not experience additional battery drainage as a result of the product. Ideally, you want to collect only the data you need.
The biggest challenge was handling the data that came with a delay or loss. Sometimes a critical signal could be delivered to the cloud a week after a data batch was sent. As it is critical, you cannot assume that the signal was lost – you need to process the signal as if it was processed a week ago, along with putting it inside all the necessary aggregates.
Also, if a single vehicle is operated by multiple drivers, how do you make sure that each driver profile is populated with the data only from that driver? There are multiple ways of solving the issue – there are existing DriverID solutions on the market. But it is also possible to divide the dataset without that if you consider mobile phones as a data source. Based on the mobile activity, you can understand whether the user was present in the vehicle when the trip was made.
Building analytics and insights
We needed to build a proper datalake, with a useful data inside. Our team of data scientists have allowed us to make sure that the data is easily accessible, making the analytics process hassle-free. This has helped to tailor the data lake specifically with machine learning and analytics in mind.
Also, our APIs allow ease of access to our data analytics capabilities. These products are used by both customers and internal teams, so we are constantly making sure that the APIs are easy to use.
Keeping data secure
Data security is treated seriously to ensure that no data is breached. All the data channels are encrypted, and the framework is also compliant with data regulations, such as GDPR. Data collection and management features are also provided, such as incognito mode, so that the user is in control of their own data collection at any time.
Making sure that the framework is easy to adopt is vital, which is why our internal team uses the same framework – if they are not comfortable, the chances are that the customer would not be comfortable with it either, so it will require reviewing. With each project and customer engagement, we analyze how the framework is used and how it could be made easier for the customer. This includes user-friendly tools, rich documentation, and customer support.
Our Software Development Lifecycle ensures that all software components are made in accordance with the original requirements and the components themselves meet a high level of quality. Also, software features are tested on our internal users before being released to the public, ensuring that it will be well-received by customers.
Our most unique component is SDK which is written in C++ and is a lightweight solution that can run in environments from vehicle components to mobile phone applications. It has all the necessary client-side capabilities including data collection, offline machine learning predictions, on-board learning and much more.
A personalized learning platform is also provided, which includes model management and support for multiple versions of your algorithm simultaneously.
The Adaptive Framework components
Building intelligent mobility solutions faster and with more flexibility
When it comes to what I’m most proud of overcoming, it’s always the moment when the client of our framework realizes that the time-to-market for the feature they wanted is significantly lower than they initially expected.
We spent a lot of effort perfecting data access APIs, model management tools and other minor things to make it happen, and it’s rewarding every time a new feature is deployed using the framework.