Seven pilots are challenging EVOLVE large – scale testbed with varying needs and goals. The applications are supporting actual business cases into diverse domain areas such as agriculture, preventive machine maintenance, public transport, maritime surveillance, mobility services, image change detection, automotive data-driven systems.
But which are the challenges’ concerns?
Different types of data and computation patterns are demanding different types of technologies, combining big data, HPC and cloud. Individually the technologies are available to support pilot demands. However, speedup and performance improvement is not the one and only goal to reach. Business owners need to be able to experiment with different technological options; they require quicker reactions in the product development process while they also need safety when they decide to pick up a certain technology.
EVOLVE teams need to demonstrate how it can provide the means for them to minimize their efforts and investment to improve their application and fulfill their scaling needs, to be able to perform experiments and adapt their data pipelines in fluid conditions. Reduced Time-to-market is of utmost importance for EVOLVE to demonstrate through these pilots.
Business-wise this is THE most significant achievement for EVOLVE to reach; to become appealing by not necessarily requiring for high detailed level of technology knowledge.
The EVOLVE pilots do not share the same level of maturity and readiness; and they do not also need a common technology stack. However, they will co-exist on the same cluster, share cloud resources and data infrastructures. And the cluster should be able to accommodate all of them, each one with its own restrictions and goals.
A brief overview on all these pilots is exhibiting all aspects and needs for them, including their current status in respect to EVOLVE adoption; of course, their migration to the new cluster and EVOLVE cluster development are heading in two parallel feedback processes.
Image Processing Pilots - Three image processing applications are exploiting Copernicus Open Access Hub (SciHub) to get image data that can offer valuable information with variant operational scenarios:
- Maritime surveillance using observation data, historic metadata, and classification models
- Radiometric correction and change detection on satellite images
- Optimizing agro production yield using numerical models and massive historic data
Existing applications do face problems related to late response times, lower coverage abilities in terms of area sizes, inabilities to respond efficiently to operational needs. The need to bring HPC and Big-data frameworks in place in order to serve the needs of data processing flows is imperative in all cases. Measurable performance indicators are setting the goals for:
- Maritime surveillance to be able to process larger areas in less time
- Correction and change detection on satellite images to improve in respect to spatio-temporal window
- Optimizing agri production yield needs to increase the ability to cover larger areas in less time with lower error rates
The involved processing will trigger variant components and technologies within EVOLVE, mainly Tensorflow, Python, MPI, DASK and GPUs. All three pilots have already started testing and optimizing over the first versions of EVOLVE platform. For all pilots, the first necessary steps were to containerise the mature processing steps. The dockers are now installed on the EVOLVE platform. In a second step, the communication between dockers is being realized and finally, the various stages of the pipeline will be improved.
Automotive Industry - Two pilots are working to improve the automotive industry:
- Predictive maintenance application
- Data driven vehicle engineering processes support
Data sources for the predictive maintenance problem are a combination of structured (e.g., vehicle data comprising of fields such as year, make, model, etc., warranty parts and claims, etc.) and unstructured data sources (repair order narratives, time series of diagnostic test codes and vehicle parameters such as odometer reading, speed, engine temperature, engine torque, acceleration, etc., which are typically time series). The is building a model that can produce information for possible feature warning/error based on classical Machine Learning skeleton (a set of training data and test data) that will be supervised. Shorty, it is trying to find mappings between input variables (diagnostic test codes) and output ones (possible warnings/errors). The technology stack that will be used includes R & python programming language, MongoDB and Machine learning libraries of R and python.
The final goal is to gain the ability to process as many vehicles as possible within a specific time interval. The reason behind using EVOLVE as a testbed is primarily related to the large amount of data the pilot will be including (up to 100 million records for hundreds or even thousands of vehicles) and the need to retrain the models due to new data coming in on a weekly basis.
In the world of Electrification and Advanced Driver Assistance Systems novel business opportunities are open ranging from the data-driven improvement of engineering services to novel data-driven business models. A frequent problem is the detection of time-series representing physical realities (e.g., camera sensors, cylinder pressures, speeds, locations, temperatures, vibrations). The second automotive pilot illustrated the requirements and the major concepts for time-series pattern detection in terms of the cylinder pressure of a combustion engine. Considering this as an initial setting, it is sketching the workflow to describe the major processing steps. Furthermore, it presented selected datasets and the state-of-the practice in pattern detection. The presented approaches offer the advantage that the time-series data points are completely agnostic, i.e., the employed techniques exploit patterns rather than exploiting the specific physical meaning of the underlying signals. EVOLVE will contribute to big data technology stack in two ways. First, it will transition from customer-specific implementation of time-series pattern detection to general data-driven architecture. Second, the performance improvements achieved in EVOLVE will bring service providers closer to on-the fly processing. Bringing both contributions together will provide the technological underpinning for the novel automotive services exploiting the capabilities of time-series analysis.
Both automotive pilots have put core processing steps into container capsules; the preventive maintenance pilot is already on-board EVOLVE and delivered indicators with signs of significant improvement. The data-driven engineering automotive services pilot has devoted some effort to assure improved datasets (anonymization) that will be more challenging for the platform, while it also re-engineered its code before containerization. Now it moves towards deploying its algorithms on the EVOLVE platform.
Public Transport Operator Pilot - This case requires to evaluate the achieved quality of service and buses performance on a continuous basis. Planned data are being correlated with actual data, including data that correspond to framing conditions (traffic, weather, demand). The analysis needs to be applied to longer time periods, visualised in many different ways within acceptable response times. Apart from batch procedures, there is also the case of real-time identification of cases where delay is noticed and the corresponding visualisation. In all cases, large datasets need to be ingested into the system, transformed and visualised. Currently deployed technologies need to be substituted with Spark processing on dataset flows, while there is a strong need to be able to adopt new datasets from other sources in the future as well as to be able to accommodate much larger datasets. The end-to-end pipeline from data ingestion, to Spark processing and the visualisation is now on the EVOLVE cluster; the work is now targeting to measure levels of performance and to adapt the second part of the pilot case, the real-time workflow.
Performing Research on Demand Mobility Services - This is a hot research topic even for the car industry. However, there is not such a reference system, but it is a part of an ongoing research on the best feasibility options for these algorithms to accommodate the needs imposed within large areas. EVOLVE seems to offer an appealing large-scale testbed for different technologies and algorithms to be applied, tested and measured. In EVOLVE, a car-passenger matching algorithm and vehicle routing algorithms for ride hailing are being examined i.e. aiming to reach the ability to calculate 1k-100k routes per 30 seconds for a metropolitan area. The involved processing in these two pilots will trigger various components and technologies within EVOLVE, mainly Kafka, Spark and machine learning processing. Switching technologies and platforms in the data pipeline is expected to become relatively easy within the EVOLVE testbed framework.
All pilots are already deployed on the EVOLVE cluster partly or fully and are on the way to exhibit some first key performance indicators. Table 1 exhibits all pilots, their main technology components and its status. Technology stack can always change as experimentation is facilitated with other technology options, based also the notebook facility to edit and modify workflows. COVID-19 has not affected significantly the progress. All contributors switched to telework mode, while on the platform side it offered a good test to adapt its networking exposure efficiently to different client setups.