Lidar-Camera Fusion for Autonomous Navigation: A Review of Deep Learning Approaches

Sam Goundar; Sakthivel Velusamy

Lidar-Camera Fusion for Autonomous Navigation: A Review of Deep Learning Approaches

Sam Goundar, Sakthivel Velusamy

Abstract

The complementary strengths of LiDAR sensors providing precise geometric information and cameras capturing rich semantic and textural details have motivated extensive research in sensor fusion for autonomous navigation, with deep learning emerging as the dominant paradigm for exploiting these complementary modalities. This comprehensive review examines deep learning approaches for LiDAR-camera fusion across perception tasks critical to autonomous navigation including object detection, semantic segmentation, depth estimation, and localization. We systematically categorize fusion architectures by their integration stage: early fusion concatenating raw sensor data before feature extraction, late fusion combining independent modality-specific predictions, and middle fusion merging intermediate feature representations—analyzing the trade-offs in computational efficiency, robustness to sensor failures, and performance accuracy. The review examines prominent fusion network architectures including multi-view feature pyramids that project LiDAR points onto image planes, voxel-based representations enabling 3D convolutions on fused data, point-based networks processing irregular LiDAR geometry with learned image feature augmentation, and transformer-based architectures with cross-modal attention mechanisms. Particular emphasis is placed on spatial alignment challenges arising from different sensor coordinate frames, temporal synchronization between sensors with varying acquisition rates, and calibration sensitivity where small extrinsic parameter errors significantly degrade fusion performance. We analyze training strategies for fusion networks including end-to-end joint training, modality-specific pre-training followed by fusion fine-tuning, and self-supervised learning exploiting geometric consistency between modalities. The review comprehensively examines benchmark datasets (KITTI, nuScenes, Waymo Open Dataset, A2D2) and evaluation metrics, identifying inconsistencies that complicate cross-study comparisons. Application-specific fusion approaches are analyzed for different autonomous navigation scenarios: urban driving requiring robust pedestrian and vehicle detection, off-road navigation emphasizing terrain classification and traversability assessment, and indoor navigation where LiDAR provides structural mapping while cameras enable semantic understanding. We critically evaluate robustness to adverse conditions including camera degradation in low light or adverse weather where LiDAR maintains functionality, and LiDAR sparsity at long ranges where camera information provides complementary cues. The paper examines computational efficiency considerations for real-time deployment, comparing network architectures by inference latency, memory footprint, and energy consumption on embedded platforms (NVIDIA Jetson, Intel Myriad). Advanced techniques are reviewed including uncertainty-aware fusion that weights modalities based on prediction confidence, attention mechanisms that dynamically focus on informative regions, and multi-task learning frameworks jointly optimizing multiple perception objectives. Emerging approaches are analyzed including 4D radar-camera-LiDAR fusion adding velocity information, event camera integration for high-dynamic-range scenarios, and neural radiance fields (NeRF) for dense scene reconstruction from sparse multi-modal data. The review identifies critical research gaps including limited generalization across sensor configurations requiring retraining, insufficient robustness guarantees for safety-critical applications, and the scarcity of diverse training data covering edge cases and sensor degradation modes.

Keywords

LiDAR-camera fusion, deep learning, autonomous navigation, multi-modal perception.

References

Alaba, Simegnew Yihunie, and John E. Ball. “A Survey on Deep-Learning-Based LiDAR 3D Object Detection for Autonomous Driving”. Sensors 22, no. 24 (2022). https://doi.org/10.3390/s22249577.

Carranza-García, Manuel, F. Javier Galán-Sales, José María Luna-Romera, and José C. Riquelme. “Object Detection Using Depth Completion and Camera-LiDAR Fusion for Autonomous Driving”. Integrated Computer-Aided Engineering 29, no. 3 (2022): 241–58. https://doi.org/10.3233/ICA-220681.

El Madawi, Khaled, Hazem Rashed, Ahmad El Sallab, Omar Nasr, Hanan Kamel, and Senthil Yogamani. “RGB and LiDAR Fusion Based 3D Semantic Segmentation for Autonomous Driving”. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 7–12, 2019. https://doi.org/10.1109/ITSC.2019.8917447.

Kim, Taek-Lim, and Tae-Hyoung Park. “Camera-LiDAR Fusion Method with Feature Switch Layer for Object Detection Networks”. Sensors 22, no. 19 (2022). https://doi.org/10.3390/s22197163.

Li, Yingwei, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, et al. “DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection”. arXiv [Cs.CV], 2022. arXiv. http://arxiv.org/abs/2203.08195.

Liu, Leyuan, Jian He, Keyan Ren, Zhonghua Xiao, and Yibin Hou. “A LiDAR–Camera Fusion 3D Object Detection Algorithm”. Information 13, no. 4 (2022). https://doi.org/10.3390/info13040169.

Malawade, Arnav Vaibhav, Trier Mortlock, and Mohammad Abdullah Al Faruque. “HydraFusion: Context-Aware Selective Sensor Fusion for Robust and Efficient Autonomous Vehicle Perception”. arXiv [Cs.CV], 2022. arXiv. http://arxiv.org/abs/2201.06644.

Sajjad, Shafaq, Ali Abdullah, Mishal Arif, Muhammad Usama Faisal, Muhammad Danish Ashraf, and Shahzor Ahmad. “A Comparative Analysis of Camera, LiDAR and Fusion Based Deep Neural Networks for Vehicle Detection”. International Journal of Innovations in Science & Technology 3, no. 4 (January 2022): 177–86. https://doi.org/10.33411/ijist/2021030514.

Tibebu, Haileleol, Varuna De-Silva, Corentin Artaud, Rafael Pina, and Xiyu Shi. “Towards Interpretable Camera and LiDAR Data Fusion for Autonomous Ground Vehicles Localisation”. Sensors 22, no. 20 (2022). https://doi.org/10.3390/s22208021.

Wu, Danni, Zichen Liang, and Guang Chen. “Deep Learning for LiDAR-Only and LiDAR-Fusion 3D Perception: A Survey”. Intelligence & Robotics 2, no. 2 (2022). https://doi.org/10.20517/ir.2021.20.

Ye, Chao, Huihui Pan, Xinghu Yu, and Huijun Gao. “A Spatially Enhanced Network with Camera-Lidar Fusion for 3D Semantic Segmentation”. Neurocomputing 484 (2022): 59–66. https://doi.org/10.1016/j.neucom.2020.12.135.

Zhong, Huazan, Hao Wang, Zhengrong Wu, Chen Zhang, Yongwei Zheng, and Tao Tang. “A Survey of LiDAR and Camera Fusion Enhancement”. Procedia Computer Science 183 (2021): 579–88. https://doi.org/10.1016/j.procs.2021.02.100.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me