首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于CenterNet的路侧单目视角车辆3D形态精确感知
引用本文:王伟,唐心瑶,崔华,宋焕生,李颖.基于CenterNet的路侧单目视角车辆3D形态精确感知[J].中国公路学报,2022,35(9):104-118.
作者姓名:王伟  唐心瑶  崔华  宋焕生  李颖
作者单位:长安大学 信息工程学院, 陕西 西安 710064
基金项目:国家重点研发计划项目(2019YFB1600502);陕西省科技厅重大研发计划项目(2018ZDXM-GY-047);国家自然科学基金项目(62072053);陕西省重点研发计划项目(2020GY-027);国家级大学生创新创业训练计划项目(G202210710036);省级大学生创新创业训练计划项目(S202210710340)
摘    要:车辆3D形态的精确实时感知对于智能交通中的车辆行为分析、交通流参数估计等应用和无人驾驶都至关重要,其中,如何克服透视投影的限制,从路侧单目视角下感知车辆3D形态正成为具有挑战的课题之一。为解决这个难题,采取深度网络提取投影特征,结合空间标定模型中的几何约束,实现2D投影至3D空间的3D形态恢复构建。首先,基于前期工作,对道路场景中的相机构建空间标定模型,以获取透视空间的2D-3D互映射矩阵;然后,以当前流行的简洁高效的CenterNet深度网络为基础,设计车辆3D形态投影特征的检测网络,融入多尺度特征融合模块以优化透视投影下不同尺度车辆目标的检测,同时优化高斯凸包热力图以增强车辆目标的特征检测力度,根据先验几何约束设计加强损失函数以加快收敛;最后,通过建立的空间形态几何约束模型,对网络输出特征投影点进行解码,构建出完整的车辆3D形态信息。试验以路侧视角下的BrnoCompSpeed数据集和自制数据集为基础,手工标注满足试验需求的样本目标,并做图像增广以模仿多变的道路监控视角及环境。在试验结果评价中,分别对网络检测结果及最终构建的3D形态进行评价,其中对于网络检测结果,以投影特征构成投影凸包的平均精度为评价指标,交并比(IoU)阈值为0.7时,在BrnoCompSpeed测试数据集上得到AP值为87.35%,召回率和精确率分别为87.39%与90.78%。同时,设计消融试验证明网络改进模块的有效性。对于3D形态构建结果,分别对空间定位、3D尺寸、偏转角及3DIoU等指标都进行定义,并以3DIoU为评价标准,验证多个改进模块及不同视角对于最终精度的影响,最后在BrnoCompSpeed测试数据集中的平均3DIoU达到0.738。设计的网络FPS为27,可满足实时性的需求。

关 键 词:交通工程  路侧单目3D检测  改进CenterNet  车辆3D形态感知  车路协同  
收稿时间:2022-01-08

Accurate Perception of Three-dimensional Vehicle Form in Roadside Monocular Perspective Based on CenterNet
WANG Wei,TANG Xin-yao,CUI Hua,SONG Huan-sheng,LI Ying.Accurate Perception of Three-dimensional Vehicle Form in Roadside Monocular Perspective Based on CenterNet[J].China Journal of Highway and Transport,2022,35(9):104-118.
Authors:WANG Wei  TANG Xin-yao  CUI Hua  SONG Huan-sheng  LI Ying
Institution:School of Information Engineering, Chang'an University, Xi'an 710064, Shaanxi, China
Abstract:Accurate real-time perception of three-dimensional (3D) vehicle form is very important for many applications such as vehicle behavior analysis and traffic flow parameter estimation in intelligent transportation system (ITS) and autonomous driving. Among them, how to overcome the limitation of perspective projection and perceive 3D vehicle form by roadside monocular cameras is becoming one of the challenges in ITS. In order to solve this problem, we adopted deep convolution neural network (DCNN) to extract projection features, and combined geometric constraints in calibration space model to reconstruct 3D vehicle form from two-dimensional (2D) projection to 3D space. Firstly, based on our previous work, calibration space model was constructed for roadside camera to obtain the 2D-3D mapping matrix in perspective space. Then, based on the current popular deep network CenterNet, a simple and efficient DCNN, we designed the detection network of 3D vehicle form projection features with multi-scale feature fusion module integrated to optimize the detection of vehicles of different scales under perspective projection. At the same time, Gaussian convex hull heatmap was optimized to enhance vehicle feature detection. Prior geometric constraints in the enhanced loss function were also leveraged to accelerate the convergence of training. Finally, through the established geometric constraint model of 3D vehicle form, feature projection points were decoded from network outputs to construct complete 3D vehicle form information. The experiments were carried out on the public BrnoCompSpeed dataset and self-made dataset collected from roadside perspective. We manually labeled all the samples in the dataset which meet the requirements of the experiment and used data augmentation to simulate the variable camera perspective and environment. In the evaluation of experimental results, we evaluated the network detection results and the final constructed 3D vehicle form results, respectively. For the network detection results, the average precision (AP) of projection convex hull constructed by projection features was chosen as one of the evaluation metrics. When the 2D IoU threshold was set to 0.7, the AP obtained on the BrnoCompSpeed test dataset was 87.35% while the recall and precision were 87.39% and 90.78%, respectively. Besides, the ablation experiment was designed to prove the effectiveness of the network improved modules. For the 3D vehicle form results, we defined the metrics of 3D vehicle spatial localization, dimension, deflection angle and 3D IoU, and chose 3D IoU to verify the impact of multiple improved modules and different perspectives on the final AP. Finally, the average 3D IoU on BrnoCompSpeed test dataset reaches 0.738 and the frame per second (FPS) of the designed network is 27, which can achieve real-time requirements.
Keywords:traffic engineering  3D detection in monocular roadside perspective  improved CenterNet  3D vehicle form perception  cooperative vehicle infrastructure  
点击此处可从《中国公路学报》浏览原始摘要信息
点击此处可从《中国公路学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号