基于竞争循环双Q网络的自适应交通信号控制 Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于竞争循环双Q网络的自适应交通信号控制

引用本文：	陆丽萍,程垦,褚端峰,吴超仲,邱雨洁.基于竞争循环双Q网络的自适应交通信号控制[J].中国公路学报,2022,35(8):267-277.

作者姓名：	陆丽萍程垦褚端峰吴超仲邱雨洁

作者单位：	1. 武汉理工大学计算机与人工智能学院, 湖北武汉 430070;2. 武汉理工大学智能交通系统研究中心, 湖北武汉 430063

基金项目：	国家重点研发计划项目(2021YFB2501104)

摘要：	为了更加有效且可靠地自适应协调交通流量,以减少车辆的停车等待时间为目标,提出了3DRQN(Dueling Double Deep Recurrent Q Network)算法对交通信号进行控制。算法基于深度Q网络,利用竞争架构、双Q网络和目标网络提高算法的学习性能;引入了LSTM网络编码历史状态信息,减少算法对当前时刻状态信息的依赖,使算法具有更强的鲁棒性。同时,针对实际应用中定位精度不高、车辆等待时间难以获取等问题,设计了低分辨率的状态空间和基于车流压力的奖励函数。基于SUMO建立交叉口的交通流模型,使用湖北省赤壁市交叉口收集的车流数据进行测试,并与韦伯斯特固定配时的策略、全感应式的信号控制策略和基于3DQN(Dueling Double Deep Q Network)的自适应控制策略进行比较。结果表明:所提出的3DRQN算法相较上述3种方法的车辆平均等待时间减少了25%以上。同时,在不同车流量及左转比例的场景中,随着左转比例和车流量的增大,3DRQN算法的车辆平均等待时间会有明显上升,但仍能保持较好效果,在车流量为1 800 pcu·h^-1、左转比例为50%的场景下,3DRQN算法的车辆平均等待时间相比3DQN算法减少约15%,相比感应式方法减少约24%,相比固定时长的方法减少约33%。在车流激增、道路通行受限、传感器失效等特殊场景下,该算法具有良好的适应性,即使在传感器50%失效的极端场景下,也优于固定时长的策略10%以上。表明3DRQN算法具有良好的控制效果,能有效减少车辆的停车等待时间,且具有较好的鲁棒性。
关键词：	交通工程交叉口信号控制深度强化学习深度Q网络
收稿时间：	2020-12-11
Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network

LU Li-ping,CHENG Ken,CHU Duan-feng,WU Chao-zhong,QIU Yu-jie.Adaptive Traffic Signal Control Based on Dueling Recurrent Double Q Network[J].China Journal of Highway and Transport,2022,35(8):267-277.

Authors:	LU Li-ping CHENG Ken CHU Duan-feng WU Chao-zhong QIU Yu-jie

Institution:	1. School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, Hubei, China;2. Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, Hubei, China

Abstract:	Most research on traffic signal control algorithms based on reinforcement learning has problems such as a lack of robustness verification in special scenarios,such as road traffic restrictions,and difficulties in obtaining information in real scenarios,such as high-precision positioning data.To adaptively coordinate traffic flow more effectively and reliably,the dueling double deep recurrent Q network (3DRQN) algorithm is proposed to reduce the waiting time of vehicles.This algorithm is based on the deep Q network and uses a dueling architecture,double Q network,and target network to improve the learning performance.Moreover,it is combined with a long short-term memory network to encode historical state information to reduce dependence on the current state information and make the algorithm more robust.Furthermore,to solve the problem of low positioning accuracy and difficulty in obtaining the vehicle waiting time in practical applications,a low-resolution state space and a reward function based on traffic pressure were designed.The intersection model was established based on the SUMO simulation.The method was compared with the Webster's method for fixed-time control,a full actuated signal control strategy,and the dueling double deep Q network (3DQN) algorithm.The traffic data collected at an intersection in the city of Chibi,Hubei province,were used for testing.The average vehicle waiting time of the 3DRQN method was reduced by more than 25%.As traffic densities and left-turn ratios increased,the average vehicle waiting times of the 3DRQN algorithm increased significantly.However,the 3DRQN algorithm could still maintain acceptable performance.In the scenario where the density was 1 800 pcu·h^-1 and the left turn ratio was 50%,the average vehicle waiting time of 3DRQN was reduced by 15% compared with the 3DQN method,24% compared with the actuated signal control method,and 33% compared with the fixed-time method.The 3DRQN method has good adaptability in special scenarios,such as traffic surges,limited road traffic,and sensor failure.Even in the worst-case scenario,in which 50% of the sensors failed,it performed better than the fixed-time strategy by more than 10%.The experimental results show that the 3DRQN algorithm can effectively reduce the vehicle waiting time and can provide good control and robustness.

Keywords:	traffic engineering intersection signal control deep reinforcement learning deep Q network

	点击此处可从《中国公路学报》浏览原始摘要信息
	点击此处可从《中国公路学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏