Exploring Efficient Data Management in Robotics Software with Antoine Bassoul

Paris, France – November 22th, 2024

Antoine Bassoul - Robotics Software Engineer

In the first part of our interview with Antoine Bassoul, we explored his fascinating career journey through robotics software development, from his work on self-driving cars at Continental to leading autonomy teams at Navya and Exail. Antoine also shared insights into the vital role data management plays in scaling robotics operations, emphasizing the importance of testing, validation, and automation.

1. Current Challenges in Data Management

  • Question 1: What are some of the most pressing data management challenges you're currently facing in robotics software development and operations?

There are many challenges across the board, as the field is still in its infancy in the robotics sector. We’re far from where the tech industry is — there aren’t many tools available, and robotic engineers often have more of an electrical engineering background than a tech one. In my experience, the most pressing challenge in data management is building solid foundations: automating data collection, basic curation, and tagging to ensure that data isn’t lost because it wasn’t uploaded or becomes so disorganized that old data is impossible to navigate. A high-performance robotic database should enable quick data queries across different indexes — vertical queries (e.g., a time point, time range), horizontal queries (e.g., a set of signals), and metadata-based queries (e.g., all data for a given spatial range, set of devices, or situations) — without requiring the download of hundreds of gigabytes upfront. The goal is to have a steadily growing database that doesn’t spiral out of control with 'junk' data (highly redundant, invalid, or lacking metadata for basic searches), as all other tools will rely on this foundation.

When working in small robotics teams without large resources to build a complete data management system, it’s challenging to create and maintain a high-quality data repository that can be expanded upon. I’d say the four biggest pain points in data management are: 1. Uploading all relevant data, 2. Curating this data (including cutting, editing, and copying recordings), 3. Organizing it with comprehensive metadata, and 4. Conducting easy and efficient queries that can be integrated with existing data tools. The most urgent challenge for robotic software development in general is automated testing, which needs to be done collaboratively and incrementally across all teams.

  • Question 2: How do these challenges affect the efficiency and outcomes of your projects?

Without proper data management in place, it becomes extremely difficult to automate the many tasks that robotic engineers need to perform to develop and maintain a robotic software system. As a robotics company expands, more time is spent on validation, customer support, and issue resolution, leading to increased friction and making it harder to ship new software and innovate. Additionally, it becomes much less enjoyable for engineers when they are bogged down with friction and 'menial tasks' rather than focusing on delivering new code and features.

  • Question 3: What data management tools have you been using to date? What are some of the limitations or pain points you've encountered with these tools?

I don’t think there are many tools specifically designed for the robotics sector yet, and the need for better software infrastructure to lower the barrier to entry and simplify development is a hot topic in robotics right now.
I’ve used Foxglove, which is great for creating robotics dashboards, inspecting, and visualizing robotic data in a web-based, collaborative way. It’s also quite good for storing recordings and performing metadata-based queries. It offers tools to automate data uploads and metadata creation, but data curation still has to be done manually, and it may lack a GUI or API for cutting, copying, editing, and organizing recordings.
I haven't used it personally, but I’ve heard of Rerun, which is used for creating visualizations and also aims to serve as a multimodal time-series database.
At a previous company, we used Databricks for analytics, though we had to develop a lot of custom ETL processes to ingest our robotic data, and it was better suited for relatively lightweight time-series data from telemetry.
The teams I worked with mostly used Jenkins for continuous integration and testing, combined with a unit test framework, but I feel that this setup isn’t particularly well-suited for robotics.

  • Question 4: How do these limitations hinder you? ?

I’ve always had challenges with data uploads because robotic data is so large and connectivity is often poor, making it a tedious manual process that often involves physical devices. Even then, the data still needs to be uploaded, and tools are required to inspect it. There’s also the problem of both too much and too little data: you might conduct a week-long, multi-vehicle data collection campaign, resulting in a huge amount of data that’s difficult to curate. If it’s not curated, it will simply sit on a server, forgotten. At the same time, after a while, it becomes obsolete due to hardware or software architecture changes.
When you need specific data, such as from a particular vehicle, hardware variant, software version, sensor set, or scenario, it’s often hard or even impossible to find, forcing you to launch another data collection campaign.
I think we lack the tools to curate data both on the device and in the cloud in a collaborative way: tools that allow inspection of large datasets, cutting recordings, editing or adding data, attaching metadata, and maintaining data traceability (e.g., did it come from a customer ticket, a validation phase, or a data collection campaign?).
Even when data is fairly well organized in the cloud, it’s often hard to access. Robotic data is stored in its own formats, so it usually has to be downloaded first to be accessed efficiently. For example, if you’re a small team, it’s difficult to connect data labeling tools to your cloud data because several transformations are needed. Accessing the data directly through the cloud interface is often inefficient.

If connectivity isn’t great, for instance here with only 4G, and the amount of data is large, it might be best to connect physically to the robot.

2. Differentiators of Heex Technologies

  • Question 5: You’ve identified several key differentiators in Heex Technologies' product. Could you elaborate on what stood out to you the most and why?

The key feature of Heex’s product is the automation of data upload and on-device curation. This significantly reduces the amount of data that needs to be uploaded, analyzed, and stored. For me, this means I end up with more usable data, and as a developer, I have much more control over what data gets recorded and when. For example, if I realize I need more data of close, static objects when my vehicle is turning, I can easily capture that without needing to set up a measurement campaign or ask operators or customers to spend extra time recording, tagging, and uploading those specific scenarios.
Similarly, if I know a specific operation, like working around an offshore windmill, is going to occur, I can configure the system to record snapshots of data around the object without burdening the operator. This way, I get more data that’s both generally relevant and specifically tailored to my needs, with the added benefit of automated uploading and initial curation, which saves a lot of time.
The overall data experience is greatly improved because the tool is collaborative. Once the integration is done, configurations can be managed by non-experts, and there are no software modifications required on the device, which eliminates risk. It’s seamless for anyone to customize data collection to their specific needs and inspect the generated events.

  • How do features like “Automatic trigger configuration and OTA deployment” streamline your operations? 

It makes it easy for anyone to configure what data will be recorded and when, and since no new software needs to be deployed, there’s no risk involved. For example, reconfiguring what is recorded before a critical operation around an offshore wind turbine can be done without concern).

  • How does "data upload optimization" benefit your team?

Data upload is generally a hassle in many ways, so automating this process saves a significant amount of time.

  • What impact do tools that only record the topics needed for each event have on your data workflows?

There are several benefits. First, reducing the amount of data uploaded can lower storage costs, but more importantly, it makes the upload process easier and reduces the need to physically connect to the device to transfer data.
By minimizing the amount of data uploaded, it also reduces the effort required to analyze and organize it, which is often a bottleneck and a source of data loss. If data is not analyzed or organized quickly, it tends to get lost. Since the data is associated with an event and there’s a programmatic interface to add metadata, the tagging and organizing process is also partially automated.

  • How does the triggered-based event on any signal help you monitor your systems more efficiently?

You can never precisely predict what everyone will need, so it's best to allow users to configure triggers themselves, giving them the freedom to be creative in deciding how and based on which signals they create triggers.

Running a data collection campaign to build an object detection validation dataset. With 3 different vehicles with each 6 lidars and other sensors, and different hardware, it’s going to be a lot of work to process the terabytes of data produced.

3. Business Case Rationale for Using Heex Technologies

  • Question 6: From a business perspective, what is the rationale for adopting a tool like Heex Technologies for your activities?

For me, the rationale is to support the ongoing building and maintenance of a robotic dataset, which is absolutely critical for efficiently producing reliable software, by recording more data (don’t underestimate how much data is 'trashed' on-premise) and ensuring the data is more relevant. Automating data recording and uploads frees up time for operators and reduces the need for data collection campaigns. Perhaps most importantly, it enables more collaborative data management through a web platform, encouraging more data initiatives from developers and reducing friction in their daily work.
In general, thanks to its ease of use, graphical interface, and collaborative nature, Heex’s product enhances data awareness within the team, boosts initiative, engagement, and innovation, and helps promote a vision of better software infrastructure and automation, all without needing to secure large amounts of resources upfront.

  • Question 7: Could you highlight the specific cost savings and efficiency improvements you’ve seen (or expect to see) by using Heex Technologies compared to traditional methods like continuous recording and on-premise data curation?

I would expect much less friction in delivering data to developers, along with a reduction in data loss and an increase in the size and quality of my datasets. I would also expect to see less uncurated and useless data stored in the cloud

  • Question 8: Finally, how do you see Heex Technologies evolving or continuing to add value to the robotics software industry in the future?

I believe that in the future, it will be mandatory for every robotics company, big or small, to have a large collection of high-quality data points, and Heex’s product is a great step in that direction. There likely isn’t a one-size-fits-all solution for data management in robotics, and companies will need to use a variety of tools. These tools must be open, customizable, extendable, and able to integrate with others.

Antoine Bassoul’s insights shed light on the critical challenges and innovations in robotics data management. From foundational issues like data curation and tagging to the transformative impact of tools like Heex Technologies, his expertise highlights the growing importance of efficient, collaborative, and automated data workflows in advancing robotics software development. As the industry evolves, solutions that streamline data management and foster innovation will be key to unlocking the full potential of robotics.