首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于混合近端策略优化的交叉口信号相位与配时优化方法
引用本文:陈喜群,朱奕璋,吕朝锋.基于混合近端策略优化的交叉口信号相位与配时优化方法[J].交通运输系统工程与信息,2023,23(1):106-113.
作者姓名:陈喜群  朱奕璋  吕朝锋
作者单位:浙江大学,a. 建筑工程学院,智能交通研究所;b. 工程师学院,智能交通研究所;c. 建筑工程学院,杭州 310058
基金项目:国家自然科学基金(72171210);浙江省自然科学基金重点项目(LZ23E080002)
摘    要:交通信号优化控制是从供给侧缓解城市交通拥堵的重要手段,随着交通大数据技术的发展,利用深度强化学习进行信号控制成为重点研究方向。现有控制框架大多属于离散相位选择控制,相位时间通过决策间隔累积得到,可能与智能体探索更优动作相冲突。为此,本文提出基于混合近端策略优化(Hybrid Proximal Policy Optimization, HPPO)的交叉口信号相位与配时优化方法。首先在考虑相位时间实际应用边界条件约束下,将信号控制动作定义为参数化动作;然后通过提取交通流状态信息并输入到双策略网络,自适应生成下一相位及其相位持续时间,并通过执行动作后的交通状态变化,评估获得奖励值,学习相位和相位时间之间的内在联系。搭建仿真平台,以真实交通流数据为输入对新方法进行测试与算法对比。结果表明:新方法与离散控制相比具有更低的决策频率和更优的控制效果,车辆平均行程时间和车道平均排队长度分别降低了27.65%和23.65%。

关 键 词:智能交通  混合动作空间  深度强化学习  混合近端策略优化  智能体设计  
收稿时间:2022-08-10

Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization
CHEN Xi-qun,ZHU Yi-zhang,LV Chao-feng.Signal Phase and Timing Optimization Method for Intersection Based on Hybrid Proximal Policy Optimization[J].Transportation Systems Engineering and Information,2023,23(1):106-113.
Authors:CHEN Xi-qun  ZHU Yi-zhang  LV Chao-feng
Institution:a. Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture; b. Polytechnic Institute & Institute of Intelligent Transportation Systems; c. College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, China
Abstract:Traffic signal timing is one of the critical measures to alleviate urban traffic congestion from the supply side. With traffic big data technology development, traffic signal control based on deep reinforcement learning has become a key research direction. Most of the existing control frameworks belong to discrete phase selection control, where phase associated duration is obtained by accumulating decision intervals. It may conflict with the agent's exploration for better actions. Therefore, this paper proposes a signal phase and timing optimization method based on hybrid proximal policy optimization for intersection. The study first defines a signal control action as a parameterized action under the constraint of practical application boundary condition of phase duration. Then, the state information is extracted and input into the bi-policy network to adaptively generate the next phase and its associated duration. The reward value of implementing action is evaluated according to the state change of the road network, so as to learn the intrinsic connection between phase and phase associated duration. A simulation platform is built to test the proposed method and compare the algorithms with real traffic flow data. Results show that compared with the discrete control, the proposed method achieves a lower decision frequency and better control effect, and the average travel time of vehicles and average queue length of lanes are reduced by 27.65% and 23.65%, respectively.
Keywords:intelligent transportation  hybrid action space  deep reinforcement learning  hybrid proximal policy  optimization  agent design  
点击此处可从《交通运输系统工程与信息》浏览原始摘要信息
点击此处可从《交通运输系统工程与信息》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号