Semantic Scene Understanding for Autonomous Vehicles: A Comprehensive Review of Vision Transformers
Abstract
Keywords
References
Chen, Wuyang, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, et al. “A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation”. arXiv [Cs.CV], 2022. arXiv. http://arxiv.org/abs/2112.09747.
Chu, Xiangxiang, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, and Chunhua Shen. “Twins: Revisiting the Design of Spatial Attention in Vision Transformers”. arXiv [Cs.CV], 2021. arXiv. http://arxiv.org/abs/2104.13840.
Dong, Xiaoyi, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. “CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows”. arXiv [Cs.CV], 2022. arXiv. http://arxiv.org/abs/2107.00652.
Dosovitskiy, Alexey. “An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale”. arXiv Preprint arXiv:2010. 11929, 2020. Available at https://arxiv.org/pdf/2010.11929/100.
Gu, Jiaqi, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, and David Z. Pan. “Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation”. arXiv [Cs.CV], 2021. arXiv. http://arxiv.org/abs/2111.01236.
Lee, Youngwan, Jonghee Kim, Jeff Willette, and Sung Ju Hwang. “MPViT: Multi-Path Vision Transformer for Dense Prediction”. arXiv [Cs.CV], 2021. arXiv. http://arxiv.org/abs/2112.11010.
Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows”. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 10012–22, 2021. Available at https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper.html.
Naseer, Muzammal, Kanchana Ranasinghe, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. “Intriguing Properties of Vision Transformers”. arXiv [Cs.CV], 2021. arXiv. http://arxiv.org/abs/2105.10497.
Wu, Dong, Man-Wen Liao, Wei-Tian Zhang, Xing-Gang Wang, Xiang Bai, Wen-Qing Cheng, and Wen-Yu Liu. “YOLOP: You Only Look Once for Panoptic Driving Perception”. Machine Intelligence Research 19, no. 6 (1 December 2022): 550–62. https://doi.org/10.1007/s11633-022-1339-y.
Xie, Enze, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, and Ping Luo. “SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers”. In Advances in Neural Information Processing Systems, edited by M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan, 34:12077–90. Curran Associates, Inc., 2021. https://proceedings.neurips.cc/paper_files/paper/2021/file/64f1f27bf1b4ec22924fd0acb550c235-Paper.pdf.
Yang, Michael. “Visual Transformer for Object Detection”. arXiv [Cs.CV], 2022. arXiv. http://arxiv.org/abs/2206.06323.
Zhu, Xizhou, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. “Deformable DETR}: Deformable Transformers for End-to-End Object Detection”. In International Conference on Learning Representations, 2021. https://openreview.net/forum?id=gZ9hCDWe6ke.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.