考虑博弈的多智能体强化学习分布式信号控制 Distributed Signal Control of Multi-agent Reinforcement Learning Based on Game期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

考虑博弈的多智能体强化学习分布式信号控制

引用本文：	曲昭伟,潘昭天,陈永恒,李海涛,王鑫.考虑博弈的多智能体强化学习分布式信号控制[J].交通运输系统工程与信息,2020,20(2):76-82.

作者姓名：	曲昭伟潘昭天陈永恒李海涛王鑫

作者单位：	吉林大学交通学院，长春 130022

基金项目：	国家自然科学基金/National Natural Science Foundation of China(51705196).

摘要：	交通需求的不均衡和波动会增加分布式信号控制优化的难度. 由于现有独立动作的多智能体强化学习(IA-MARL)仅基于自身的历史经验做出决策，基于IA-MARL的分布式信号控制难以及时缓解交通需求不均衡和波动的影响. 本文融入博弈论的混合策略纳什均衡概念，改进IA-MARL的决策过程，提出考虑博弈的多智能体强化学习(G-MARL)框架. 在采用带有泊松到达率的道路网络流量不均衡输入的格子网络中，分别对基于IA-MARL 和GMARL 的分布式控制方法进行数值模拟，获取单位行程时间和单位车均延误曲线. 结果显示，与IA-MARL相比，G-MARL在单位行程时间和单位车均延误方面分别改善59.94%和81.45%. 证明G-MARL适用于不饱和且交通需求不均衡和波动的分布式信号控制.
关键词：	智能交通分布式交通信号控制多智能体强化学习不均衡需求下的城市道路网络博弈论数值模拟
收稿时间：	2019-12-10
Distributed Signal Control of Multi-agent Reinforcement Learning Based on Game

QU Zhao-wei,PAN Zhao-tian,CHEN Yong-heng,LI Hai-tao,WANG Xin.Distributed Signal Control of Multi-agent Reinforcement Learning Based on Game[J].Transportation Systems Engineering and Information,2020,20(2):76-82.

Authors:	QU Zhao-wei PAN Zhao-tian CHEN Yong-heng LI Hai-tao WANG Xin

Institution:	College of Transportation, Jilin University, Changchun 130022, China

Abstract:	The difficulty of distributed signal control is increasing due to the unbalance and fluctuation of traffic demand. Since the decision-making of existing independent action multi-agent reinforcement learning (IA-MARL) is based on its own historical experience, the distributed signal control based on IA-MARL is difficult to timely alleviate the impact of unbalanced and fluctuating traffic demand. In this paper, the framework of multi- agent reinforcement learning based on the game (G-MARL) was proposed by improving the decision- making of IAMARL with integrating the mixed strategy Nash- equilibrium, which is a concept in game theory. In the grid network with the Poisson arrival rate, the distributed control methods based on IA-MARL and G-MARL were simulated to obtain the unit travel time and the unit vehicle delay curves. The results show that, the unit travel time and the unit vehicle average delay obtained by G-MARL are reduced by 59.94% and 81.45% compared with IAMARL respectively. It is proved that G-MARL is suitable for distributed signal control when there are unbalances and fluctuations in traffic demand with the unsaturated state.

Keywords:	intelligent transportation distributed traffic signal control multi-agent reinforcement learning urban road network under unbalanced demand game theory numerical simulation
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《交通运输系统工程与信息》浏览原始摘要信息
	点击此处可从《交通运输系统工程与信息》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏