The 1st Workshop on Foundation Models for Autonomous Driving (FMAD) at IEEE ITSC aimed to identify challenges and opportunities of foundation models for autonomous driving and how their potential will disrupt future systems. It brought together leading researchers, engineers, and practitioners to explore the transformative potential of foundation models in autonomous driving systems. This workshop examined cutting-edge applications of foundation models in perception, prediction, planning, and decision-making, while addressing critical challenges in safety-critical automotive environments.
The integration of foundation models into Autonomous Driving (AD) systems has the potential to revolutionize the field. Built on architectures with significant capacity, foundation models are capable of utilizing vast data collections through self-learning approaches. This enables them to achieve remarkable performance across diverse sets of tasks in different domains. Modern models such as DALL-E, CLIP, and SAM stand to be cornerstones for a wide range of solutions in the AD domain and have initiated new research directions.
The application of AD systems provides the opportunity to collect large amounts of data as they are equipped with high quality, multi-modal sensor suits. While the manual annotation of this data proves to be costly and time-intensive, foundation models not only allow tapping into this data source but also rapidly adapting to new tasks. Among other challenges, integrating different sensor modalities, considering the spatial and temporal nature of the data, and determining how to use it for planning and prediction remain open research questions.
Additionally, issues like limited availability of computational resources and the importance of safety considerations are inherent to AD platforms and need to be addressed. Overall, research interest is increasing in the field of foundation models for AD, which we want to explore with the following topics:
September 24, 2024
IEEE ITSC 2024, Edmonton, Canada
In-Person
We invite contributions on a broad range of topics related to foundation models and their application in autonomous driving systems.
Foundation models for camera-based perception, LiDAR processing, sensor fusion, and multi-modal understanding.
Motion forecasting, trajectory prediction, behavior modeling, and planning with foundation models.
Vision-language models, multi-modal approaches, and end-to-end systems for autonomous navigation.
Safety validation, robustness testing, adversarial scenarios, and verification methodologies.
Large-scale datasets, self-supervised learning, transfer learning, and data-efficient methods.
Real-time inference, edge computing, model compression, and practical deployment challenges.
| Time | Event |
|---|---|
| 08:30 | Greetings and Introduction |
| 08:35 | Zhixiang Wei, University of Science and Technology of China Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Generalized Urban Scene Segmentation |
| 08:50 | Yun Li, University of Tokyo Large Language Models for Human-like Autonomous Driving: A Survey |
| 09:20 | Rares Ambrus, Toyota Research Institute Visual Foundation Models for Embodied Applications |
| 09:50 | Coffee break |
| 10:20 | Aleksandr Petiushko, Gatik Middle Mile and Foundation Models |
| 10:50 | Gilles Puy, Valeo.ai Leveraging image foundation models to pretrain lidar networks |
| 11:20 | Short break |
| 11:30 | Long Chen, Wayve Building Foundation Models for Autonomous Driving |
| 12:00 | Royden Wagner, Karlsruhe Institute of Technology Representation Learning for Motion Forecasting |
The workshop was organized by: