News 01.14.2021

Predicting Vehicle Maintenance using on-board and off-board data

In this article, we are going to explain how we use data science to develop connected car applications, focusing on predictive maintenance based on DTC codes and available OBD car data.


Predictive maintenance aims to identify vehicle maintenance issues before they occur by leveraging data from warranty repairs with current vehicle sensor data. Predictive data analytics can find meaningful correlations that would be difficult for a human to discover. A performance anomaly that may appear insignificant when observed on a single vehicle can be a red flag when aggregated with data from hundreds or thousands of other vehicles that have the same problem. It is currently best suited for commercial vehicles but is expected to become the standard for all vehicles in the near future.

Here are some of the vehicle predictive maintenance use-cases:

  • More accurate prediction of part failures (faults are identified months before than in the case with classic diagnostics)
  • Optimize part repair and replacement schedule – make maintenance and other service offering more cost-efficient and more valuable to customers
  • Less downtime of vehicles (unscheduled repairs), less time to resolution and optimization of vehicle longevity which leads to increased brand equity
  • Decrease fuel consumption and CO2 emissions by keeping vehicles in a goodshape
  • Streamline of the supply chain and better planning of spare parts production
  • Long-term retention of clients in the official workshops
  • Development of new Maintenance as a Service models where OEM guarantees uptime of vehicles
  • Identification of potential recalls in early phases
  • Help in new prototypes development.

Two different approaches are possible when talking about predictive maintenance. First one is based on a combination of DTC codes and available OBD car data while other is using general signal deviations of units inside a population or fleet of vehicles.


The classic predictive maintenance problem is illustrated in the figure below:

Given data streamed from a vehicle—such as diagnostic trouble codes (DTCs) and other vehicle parameters at the time of occurrence of the trouble codes (e.g., vehicle speed, mileage reading, engine and oil temperature, torque, RPM, etc.), can we predict an ensuing repair or a maintenance job on the vehicle? DTCs are alphanumeric codes that are emitted by the on-board diagnostic systems in the car, and they typically signal when a vehicle sensor reports values outside the normal or accepted range. They are not always indicating a need for repair or maintenance. Also, they might indicate or report a fault but it might be simply a faulty sensor.

Our hypothesis for predictive maintenance is: Given a sufficiently large number of past repairs performed by humans and DTCs leading up to the repairs, it is possible to create a machine learning model that discovers relationships between DTC’s and repairs. In the first application we are predicting “call-to-actions” based on DTCs and assigned freeze frames. Those are the actions that need to be taken during the model creation:

  1. Data preparation (cleaning, normalization, etc.)
  2. Separate out a validation dataset
  3. Set-up the test harness to use 10-fold cross validation.
  4. Build multiple different models to predict “call-to-action” from DTC codes and freeze frames.
  5. Select the best model


As repair descriptions get more granular (assuming data cleanliness and quality are maintained), the modeling problem becomes harder. For example, if we just restrict the problem to one thing, identifying whether a car is going to have an unscheduled transmission, engine, or suspension repair, then the problem is a lot more manageable, the whole data set can be used to train the model. If we try to identify the specifics of the transmission, engine, or suspension repairs (e.g., at the level of identifying the subsystems or the parts that need to be worked upon), we must group the available data sets and consequently use smaller sets for each classification.


Some of the benefits of this model are: All the data needed for the approach can be retrieved directly from any existing OBD device and OEM databases; No need for further investment in hardware is necessary; High predictability of subsystem failure (subsystem specific to transmission, engine, suspension, etc.) in a reasonable period of time; Easiest and cheapest approach. On the other hand, some of the shortcomings are: More data and more vehicle needed to enter into more details related to specific faults in the subsystems; It improves current technology and maintenance procedures by linking with high accuracy different DTC codes with possible repairs. However, it doesn’t still solve the problem of unanticipated faults that might happen in vehicles.

Other news and articles

Powered by

© 2021 EVOLVE . All rights reserved.
Privacy Policy . Cookies Policy

Made by - Customer Experience Design
Cookies Settings

EVOLVE Project may use cookies to store your login data, collect statistics to optimise the functionality of the site and to perform marketing actions based on your interests.