基于双延迟深度确定性策略梯度的船舶自主避碰方法 A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双延迟深度确定性策略梯度的船舶自主避碰方法

引用本文：	刘钊,周壮壮,张明阳,刘敬贤.基于双延迟深度确定性策略梯度的船舶自主避碰方法[J].交通信息与安全,2022,40(3):60-74.

作者姓名：	刘钊周壮壮张明阳刘敬贤

作者单位：	1.武汉理工大学航运学院武汉 430063

基金项目：	国家自然科学基金项目52171351

摘要：	为满足智能船舶自主航行的发展需求，解决基于强化学习的船舶避碰决策方法存在的学习效率低、泛化能力弱以及复杂会遇场景下鲁棒性差等问题，针对船舶避碰决策信息的高维性和动作的连续性等特点，考虑决策的合理性和实时性，研究了基于双延迟深度确定性策略梯度（TD3）的船舶自主避碰方法。根据船舶间相对运动信息与碰撞危险信息，从全局角度构建具有连续多时刻目标船信息的状态空间；依据船舶操纵性设计连续决策动作空间；综合考虑目标导向、航向保持、碰撞危险、《1972年国际海上避碰规则》（COLREGs）和良好船艺等因素，设计船舶运动的奖励函数；基于TD3算法，根据状态空间结构，结合长短期记忆（LSTM）网络和一维卷积网络，利用Actor-Critic结构设计船舶自主避碰网络模型，利用双价值网络学习、目标策略平滑以及策略网络延迟更新等方式稳定网络训练，利用跳帧以及批量大小和迭代更新次数动态增大等方式加速网络训练；为解决模型泛化能力弱的问题，提出基于TD3的船舶随机会遇场景训练流程，实现自主避碰模型应用的多场景迁移。运用训练得到的船舶自主避碰模型进行仿真验证，并与改进人工势场（APF）算法进行比较，结果表明:所提方法学习效率高，收敛快速平稳；训练得到的自主避碰模型在2船和多船会遇场景下均能使船舶在安全距离上驶过，并且在复杂会遇场景中比改进APF算法避碰成功率高，避让2~4艘目标船时成功率高达99.233%，5~7艘目标船时成功率97.600%，8~10艘目标船时成功率94.166%；所提方法能有效应对来船的不协调行动，避碰实时性高，决策安全合理，航向变化快速平稳、震荡少、避碰路径光滑，比改进APF方法性能更强。
关键词：	交通信息工程船舶避碰智能决策深度强化学习双延迟深度确定性策略梯度
收稿时间：	2022-02-16
A Twin Delayed Deep Deterministic Policy Gradient Method for Collision Avoidance of Autonomous Ships

Affiliation:	1.School of Navigation, Wuhan University of Technology, Wuhan 430063, China2.Hubei Key Laboratory of Inland Shipping Technology, Wuhan University of Technology, Wuhan 430063, China3.National Engineering Research Center for Water Transport Safety, Wuhan University of Technology, Wuhan 430063, China4.School of Engineering, Department of Mechanical Engineering, Aalto University, Espoo 20110, Finland

Abstract:	In order to meet the requirements of developingautonomous navigation of intelligent ships and solve the problems of low learning efficiency, weak generalization ability and poor robustness ofdecision-making methods for collision avoidance based on reinforcement learning, an autonomous collision avoidance method based on Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithmis proposed based on the high-dimensional characteristics of the information processed in the process of collision avoidanceand continuity nature of ship maneuvers, also considering the rationality and real-time progress of decision-making. The collision risk of a given ship is calculated by considering geographical location of the ship and the other ships nearby. The state space of intelligent collision avoidance model for autonomous ships is developed from the perspective of the global point of view. The continuous decision-making and action space of the ship is designed according to the maneuvering characteristics of encountered ships. An intelligent collision avoidance model is developed considering factors such as orientation of target ship, course keeping, collision risk, the COLREGs and good seamanship. Based on the TD3 algorithm, the ship autonomous collision avoidance network model is designed based on the state space structure, combining Long Short Term Memory(LSTM)networks and 1D-convolutional networks, and a network model is designed by using Actor-Critic structure.The network training is stabilized by means of clipped double q-learning, target strategy smoothing, and delayed policy updates.The developed collision avoidance model is trained and updated with random scenarios by usingframe skipping, dynamic increase of batch size, and iterative update times.In order to solve the problem of weak generalization ability of the model, a training process of random shipencounter scenariosbased on TD3 is proposed to achievemulti-scenario migration for theapplications of the model. A simulationis carried out to verify the model, then compared with the modified Artificial Potential Field(APF)method. The results show that the proposed method has high learning efficiency, fast and stable convergence. The trained model is applicable for the ships to passa safe distance in both two-ship and multi-ship encounter scenarios. In a complex encounter scenario, the success rate of collision avoidance reaches 99.233% when avoiding 2~4 target ships, 97.600% when 5~7 target ships, 94.166% when 8~10 target ships, is higher than that of the modified APF algorithm. The proposed method can effectively respond to the uncoordinated actions of incoming ships, with real-time performance, as well as safe and reasonable decision-making.The course change is fast, stable, and the vibration is small, also the path for avoiding collisions is smooth, which has better performance than the modified APF method.

Keywords:

	点击此处可从《交通信息与安全》浏览原始摘要信息
	点击此处可从《交通信息与安全》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏