Case study

Collecting over 300 distinct signals from a fleet of vehicles, and running them through our ML toolchain to generate insights about the drivers’ behavioral patterns.

Industry: Automotive

Customer: Eastern Europe car maker

Collecting over 300 distinct signals from a fleet of vehicles, and running them through our ML toolchain to generate insights about the drivers’ behavioral patterns.

Industry: Automotive

Customer: Eastern Europe car maker

Competence domains:

  • Data analysis & AI solutions
  • Cloud architecture & Big Data processing


Our client, an innovation department of a major Eastern European car maker, felt like they really lacked a feedback loop to track the patterns of driving, HMI interactions and car feature usage in general. These kinds of statistics are crucial for conceiving and developing new features, to be able to understand and prioritize the end users’ needs.

To overcome this gap, they ordered a data collection and analysis program on a fleet of vehicles.


Signals collected

Over 125 CAN signals, 250 BAP signals for Climate, Navigation, Drive Modes and Adaptive Cruise control and 4 types of HMI Events.

In-vehicle signal collection

  • Setup

    Vehicle Telematics Computer (VTC) 1910 integrated into vehicle CAN Buses, and into MIB unit over ethernet. VTC software included CloudMade Smart Data SDK with Yocto Linux, supporting over the air updates.

  • Observe

    First, we observe signals in the vehicle. It includes, but not limited to, vehicle telematics and driver interaction with the vehicle.
    When all signals are stored, CloudMade’s component (called Signal Collector) downsamples the data. Why is this something that needs to be done? For example, the seat belt signals in a CAN bus circulates with 10Hz frequency (10 times per second), while for learning we need this signal only “on event change” (when it switches on/off). This downsampling could be any formula (including but not limited to average/mean/max/on event change).
    CloudMade provides the ability to configure Signal Collector from the cloud. This gives an ability even to run A/B testing if needed.

  • Sync

    The sync process is required to send data from the vehicle to the cloud. The sync process takes into account security policies (authentication/authorization) and uses HTTP(S) as a communication channel. The sync process sends data in batches with compression (at least GZIP) to reduce the traffic impact.

Cloud signal processing

CloudMade’s platform was used to collect, verify, store and interpret the data. The stages of the processing are listed below.

  • Observation

    When data is stored in the cloud, CloudMade’s components observe signals which were transferred and validates them. Only validated signals will be used for aggregation and machine learning. At this stage we remove any type of noise in the data and cover security in the cloud.

  • Aggregation

    Aggregation is the main stage for data preparation before machine learning. This stage provides the ability to aggregate all data for one user and create user journeys. Each journey is a driver’s path from point A to point B including all his/her interactions in the vehicle/phone.
    At this stage, CloudMade enriches the data.
    The first stage of the enriching journeys is to map-match them to the road (since it’s usual for GPS to have noise and GPS jumps).
    At the next stages, CloudMade has the ability to add weather/traffic/calendar/etc.

  • Learning

    The learning stage works with both aggregated and raw data and creates a mathematical model (we call it an Inference Model). The Profile is a set of Mathematical Models for one specific driver and could include an unlimited number of models, to support all use cases. The Profile is stored in profile storage (usually RDBMS).


In-vehicle deployment was organized on VTC-1010 (Nexcom) with Yocto OS. CloudMade developed and deployed OTA for C++ SDK and configuration plugins, which gave the ability to configure which signals need to be collected directly from CAN, MOST and Ethernet, which frequency to use for them during the data collection, and which downsample algorithm to use.

The in-vehicle data collection module was developed in C++ and is fully compatible with previous and new generation of the vehicle for production usage.

All data from the vehicles was transferred to the cloud in both raw and downsampled formats and stored in different storages in cloud blob storage.

The cloud solution was developed with Java, Scala and Hadoop with MapReduce/Spark. This technological stack provided the ability to analyze and visualize the data from the vehicle.

Tell us about your challenge. At CloudMade we are ready to provide expertise and support.

Contact us