基于时序融合的自动驾驶多任务感知算法 Multi-task perception algorithm of autonomous driving based on temporal fusion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于时序融合的自动驾驶多任务感知算法

引用本文：	刘占文,范颂华,齐明远,董鸣,王品,赵祥模.基于时序融合的自动驾驶多任务感知算法[J].交通运输工程学报,2021,21(4):223-234.

作者姓名：	刘占文范颂华齐明远董鸣王品赵祥模

作者单位：	1.长安大学信息工程学院，陕西西安 7100642.加利福尼亚大学伯克利分校，加利福尼亚伯克利 94804-4648

基金项目：	国家自然科学基金项目U1864204国家重点研发计划项目2019YFB1600103陕西省重点研发计划项目2018ZDXM-GY-044

摘要：	采用连续图像帧作为输入，挖掘连续图像帧之间的时序关联信息，构建一种融合时序信息的多任务联合驾驶环境视觉感知算法，通过多任务监督联合优化，实现交通参与目标的快速检测，同时获取可通行区域信息；采用ResNet50作为骨干网络，在骨干网络中构建级联特征融合模块，捕捉不同图像帧之间的非局部远程依赖关系，将高分辨率图像通过卷积下采样处理，加速不同图像帧的特征提取过程，平衡算法的精度和速度；在不同的图像帧中，为了消除由于物体运动产生的空间位移对特征融合的影响，且考虑不同图像帧的非局部关联信息，构建时序特征融合模块分别对不同图像帧对应的特征图进行时序对齐与匹配，形成融合全局特征；基于共享参数的骨干网络，利用生成关键点热图的方法对道路中的行人、车辆和交通信号灯的位置进行检测，并利用语义分割子网络为自动驾驶汽车提供道路可行驶区域信息。研究结果表明:提出的感知算法以多帧图像代替单一帧图像作为输入，利用了多帧图像的序列特性，级联特征融合模块通过下采样使得计算复杂度降低为原来的1/16，与CornerNet、ICNet等其他主流模型相比，算法检测精确率平均提升了6%，分割性能平均提升了5%，并保持了每秒12帧图像的处理速度，在检测与分割速度和精度上具有明显优势。
关键词：	交通信息工程环境感知时序融合目标检测语义分割
收稿时间：	2021-02-03
Multi-task perception algorithm of autonomous driving based on temporal fusion

LIU Zhan-wen,FAN Song-hua,QI Ming-yuan,DONG Ming,WANG Pin,ZHAO Xiang-mo.Multi-task perception algorithm of autonomous driving based on temporal fusion[J].Journal of Traffic and Transportation Engineering,2021,21(4):223-234.

Authors:	LIU Zhan-wen FAN Song-hua QI Ming-yuan DONG Ming WANG Pin ZHAO Xiang-mo

Affiliation:	1.School of Information Engineering, Chang'an University, Xi'an 710064, Shaanxi, China2.University of California, Berkeley, Berkeley 94804-4648, California, USA

Abstract:	The sequential image frames were used as input to mine the temporal associated information among the continuous image frames, and a multi-task joint driving environment perception algorithm fusing the temporal information was constructed to rapidly detect the traffic participation targets and drivable area through multi-task supervision and joint optimization. ResNet50 was used as the backbone network, in which a cascaded feature fusion module was built to capture the non-local remote dependence among different image frames. The high-resolution images were processed by the convolution subsampling to accelerate the feature extraction process of different image frames, balancing the detection accuracy and speed of the algorithm. In order to eliminate the influence of spatial displacements of the objects among the image frames on the feature fusion, and considering the non-local dependence of the features of different image frames, the temporal feature fusion module was constructed to align and match the time sequences of feature maps corresponding to different image frames for forming the integrated global feature. Based on the parameter-sharing backbone network, the heat map of generating key point was exploited to detect the positions of pedestrians, vehicles and traffic signal lights on the road, and the semantic segmentation sub-network was built to provide the drivable area information for autonomous vehicles on the road. Analysis results show that the proposed algorithm takes sequential frames as input instead of single frame, which makes effective use of the temporal characteristics of the frames. In addition, its computational complexity with the cascaded feature fusion module greatly reduces to sixteenth of that without the cascaded feature fusion module through downsampling. Compared with other mainstream models, such as CornerNet and ICNet, the detection accuracy and segmentation performance of the algorithm improve by an average of 6% and 5%, respectively, and the image processing speed reaches to 12 frames per second. Therefore, the proposed algorithm has obvious advantages in the speed and accuracy of image detection and segmentation. 6 tabs, 9 figs, 31 refs.

Keywords:
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《交通运输工程学报》浏览原始摘要信息
	点击此处可从《交通运输工程学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏