基于TD3算法的人机混驾交通环境自动驾驶汽车换道研究 Lane Changing of Autonomous Vehicle Based on TD3 Algorithm in Human-machine Hybrid Driving Environment期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于TD3算法的人机混驾交通环境自动驾驶汽车换道研究

引用本文：	裴晓飞,莫烁杰,陈祯福,杨波. 基于TD3算法的人机混驾交通环境自动驾驶汽车换道研究[J]. 中国公路学报, 2021, 34(11): 246-254. DOI: 10.19721/j.cnki.1001-7372.2021.11.020

作者姓名：	裴晓飞莫烁杰陈祯福杨波

作者单位：	1. 武汉理工大学现代汽车零部件技术湖北省重点实验室, 湖北武汉 430070;2. 武汉理工大学汽车零部件技术湖北省协同创新中心, 湖北武汉 430070

基金项目：	国家自然科学基金项目（51505354）

摘要：	提高人类驾驶人的接受度是自动驾驶汽车未来的重要方向，而深度强化学习是其发展的一项关键技术。为了解决人机混驾混合交通流下的换道决策问题，利用深度强化学习算法TD3（Twin Delayed Deep Deterministic Policy Gradient）实现自动驾驶汽车的自主换道行为。首先介绍基于马尔科夫决策过程的强化学习的理论框架，其次基于来自真实工况的NGSIM数据集中的驾驶数据，通过自动驾驶模拟器NGSIM-ENV搭建单向6车道、交通拥挤程度适中的仿真场景，非自动驾驶车辆按照数据集中驾驶人行车数据行驶。针对连续动作空间下的自动驾驶换道决策，采用改进的深度强化学习算法TD3构建换道模型控制自动驾驶汽车的换道驾驶行为。在所提出的TD3换道模型中，构建决策所需周围环境及自车信息的状态空间、包含受控汽车加速度和航向角的动作空间，同时综合考虑安全性、行车效率和舒适性等因素设计强化学习的奖励函数。最终在NGSIM-ENV仿真平台上，将基于TD3算法控制的自动驾驶汽车换道行为与人类驾驶人行车数据进行比较。研究结果表明：基于TD3算法控制的车辆其平均行驶速度比人类驾驶人的平均行车速度高4.8%，在安全性以及舒适性上也有一定的提升；试验结果验证了训练完成后TD3换道模型的有效性，其能够在复杂交通环境下自主实现安全、舒适、流畅的换道行为。
关键词：	交通工程自动驾驶强化学习换道模型奖励函数人机混驾混合交通流
收稿时间：	2020-03-03
Lane Changing of Autonomous Vehicle Based on TD3 Algorithm in Human-machine Hybrid Driving Environment

PEI Xiao-fei,MO Shuo-jie,CHEN Zhen-fu,YANG Bo. Lane Changing of Autonomous Vehicle Based on TD3 Algorithm in Human-machine Hybrid Driving Environment[J]. China Journal of Highway and Transport, 2021, 34(11): 246-254. DOI: 10.19721/j.cnki.1001-7372.2021.11.020

Authors:	PEI Xiao-fei MO Shuo-jie CHEN Zhen-fu YANG Bo

Affiliation:	1. Hubei Key Laboratory of Advanced Technology of Automotive Components, Wuhan University of Technology, Wuhan 430070, Hubei, China;2. Hubei Collaborative Innovation Center of Automotive Components Technology, Wuhan University of Technology, Wuhan 430070, Hubei, China

Abstract:	Improving the human acceptance of autonomous vehicles in the future is important, and deep reinforcement learning is a key technology for their acceptance. To solve the lane-changing decision problem in a human-machine hybrid driving traffic flow, this study used the deep reinforcement-learning algorithm Twin Delayed Deep Deterministic Policy Gradient (TD3) to realize the free-lane-changing behavior of an autonomous vehicle. First, the theoretical framework of reinforcement learning based on the Markov decision process was introduced. Then, according to the driving data from actual conditions in the NGSIM dataset, a six-lane moderate traffic-congestion simulation scene was established using the autonomous driving simulator NGSIM-ENV. Other non-autonomous vehicles were controlled according to the recorded data in NGSIM. For lane-changing decision making in a continuous action space, the TD3 algorithm was used to develop a lane-changing model to control the driving behavior of the autonomous vehicle. In the proposed lane-changing model, the state space that contained the self-vehicle and environment information and the action space, which included the vehicle acceleration and heading angle, were established. Simultaneously, a reward function in the reinforcement learning was designed to consider factors such as safety, driving efficiency, and comfort. Finally, in the NGSIM-ENV simulation platform, the lane-changing behavior of the autonomous vehicles based on the TD3 algorithm was compared with that in the driving data of human drivers. The average driving velocity was found to increase by 4.8%, and the driving safety and comfort were improved. The simulation results verify the effectiveness of the lane-changing model after the model training is completed. Furthermore, safe, comfortable, and reasonable lane-changing behavior in a complex traffic environment can be realized.

Keywords:	traffic engineering autonomous vehicle reinforcement learning lane-changing model reward function human-machine hybrid driving mixed traffic flow

	点击此处可从《中国公路学报》浏览原始摘要信息
	点击此处可从《中国公路学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏