首页 | 本学科首页   官方微博 | 高级检索  
     

基于强化学习DDPG的智能车辆轨迹跟踪控制
引用本文:贺伊琳,宋若旸,马建. 基于强化学习DDPG的智能车辆轨迹跟踪控制[J]. 中国公路学报, 2021, 34(11): 335-348. DOI: 10.19721/j.cnki.1001-7372.2021.11.026
作者姓名:贺伊琳  宋若旸  马建
作者单位:长安大学 汽车学院, 陕西 西安 710064
基金项目:国家重点研发计划项目(2018YFB1600700);陕西省重点产业创新链(群)项目(2019ZDLGY15-01);中央高校基本科研业务费专项资金项目(300102220103)
摘    要:针对智能车辆在轨迹跟踪过程中的横向控制问题,提出一种基于强化学习中深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG)的智能车辆轨迹跟踪控制方法。首先,将智能车辆的跟踪控制描述为一个基于马尔可夫决策过程(MDP)的强化学习过程,强化学习的主体是由Actor神经网络和Critic神经网络构成的Actor-Critic框架;强化学习的环境包括车辆模型、跟踪模型、道路模型和回报函数。其次,所提出方法的学习主体以DDPG方法更新,其中采用回忆缓冲区解决样本相关性的问题,复制结构相同的神经网络解决更新发散问题。最后,将所提出的方法在不同场景中进行训练验证,并与深度Q学习方法(Deep Q-Learning,DQN)和模型预测控制(Model Predictive Control,MPC)方法进行比较。研究结果表明:基于DDPG的强化学习方法所用学习时间短,轨迹跟踪控制过程中横向偏差和角偏差小,且能满足不同车速下的跟踪要求;采用DDPG和DQN强化学习方法在不同场景下均能达到训练片段的最大累计回报;在2种仿真场景中,基于DDPG的学习总时长分别为DQN的9.53%和44.19%,单个片段的学习时长仅为DQN的20.28%和22.09%;以DDPG、DQN和MPC控制方法进行控制时,在场景1中,基于DDPG方法的最大横向偏差分别为DQN和MPC的87.5%和50%,仿真时间分别为DQN和MPC的12.88%和53.45%;在场景2中,基于DDPG方法的最大横向偏差分别为DQN和MPC的75%和21.34%,仿真时间分别为DQN和MPC的20.64%和58.60%。

关 键 词:汽车工程  轨迹跟踪  DDPG  智能车辆  强化学习  神经网络  
收稿时间:2020-06-21

Trajectory Tracking Control of Intelligent Vehicle Based on DDPG Method of Reinforcement Learning
HE Yi-lin,SONG Ruo-yang,MA Jian. Trajectory Tracking Control of Intelligent Vehicle Based on DDPG Method of Reinforcement Learning[J]. China Journal of Highway and Transport, 2021, 34(11): 335-348. DOI: 10.19721/j.cnki.1001-7372.2021.11.026
Authors:HE Yi-lin  SONG Ruo-yang  MA Jian
Affiliation:School of Automobile, Chang'an University, Xi'an 710064, Shaanxi, China
Abstract:To address the problem of lateral control of an intelligent vehicle during trajectory tracking, a trajectory tracking control method for an intelligent vehicle based on the deep deterministic policy gradient (DDPG) method of reinforcement learning is proposed. First, the tracking control of an intelligent vehicle was described as a reinforcement learning process based on the Markov decision process (MDP). The main framework of reinforcement learning was the actor-critic composed of actor and critic neural networks. The reinforcement learning environment included vehicle, tracking, and road models as well as a reward function. Then, the learning agent of the proposed method was updated by DDPG, in which the replay buffer was used to solve the problem of sample correlate on, and the actor and critic neural networks were copied to solve the problem of update divergence. Finally, the proposed method was tested under different scenarios and compared with the deep Q-learning (DQN) and model predictive control (MPC) methods. The results show that the reinforcement learning method based on DDPG has the advantages of a short learning time, small lateral deviation, and small angular deviation, and it can meet the requirements of vehicle tracking at different speeds. When DDPG and DQN are used as the two reinforcement learning methods, both methods can achieve the maximum cumulative reward of training under different scenarios. In the two simulation scenarios, the total learning time of DDPG is 9.53% and 44.19% of DQN, respectively, and the learning time of a single round of training is only 20.28% and 22.09% of DQN. When DDPG, DQN, and MPC are used for control, in the first scenario, the maximum lateral deviation based on DDPG is 87.5% and 50% of DQN and MPC, respectively. In the second scenario, the maximum lateral deviation based on the DDPG method is 75% and 21.34% of DQN and MPC, respectively, and the simulation time is 20.64% and 58.60% of DQN and MPC, respectively.
Keywords:automotive engineering  trajectory tracking  DDPG  intelligent vehicle  reinforcement learning  neural network  
点击此处可从《中国公路学报》浏览原始摘要信息
点击此处可从《中国公路学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号