首页 | 本学科首页   官方微博 | 高级检索  
     

动态环境下数据驱动Q-学习算法
引用本文:申元霞,王国胤. 动态环境下数据驱动Q-学习算法[J]. 西南交通大学学报, 2009, 44(6). DOI: 10.3969/j.issn.0258-2724.2009.06.014
作者姓名:申元霞  王国胤
作者单位:1. 西南交通大学信息科学与技术学院,四川,成都,610031;重庆邮电大学计算机科学与技术研究所,重庆400065;重庆文理学院计算机学院,重庆,402160
2. 西南交通大学信息科学与技术学院,四川,成都,610031;重庆邮电大学计算机科学与技术研究所,重庆400065
基金项目:国家自然科学基金资助项目,重庆市自然科学基金资助项目 
摘    要:针对动态环境下强化学习对未知动作的探索和已知最优动作的利用之间难以平衡的问题,提出了一种数据驱动Q-学习算法.该算法首先构建智能体的行为信息系统,通过行为信息系统知识的不确定性建立环境触发机制;依据跟踪环境变化的动态信息,触发机制自适应控制对新环境的探索,使算法对未知动作的探索和已知最优动作的利用达到平衡.用于动态环境下迷宫导航问题的仿真结果表明,该算法达到目标的平均步长比Q-学习算法、模拟退火Q-学习算法和基于探测刷新Q-学习算法缩短了7.79%~84.7%.

关 键 词:强化学习  数据驱动  Q-学习  不确定性

Data-Driven Q-Learning in Dynamic Environment
SHEN Yuanxia,WANG Guoyin. Data-Driven Q-Learning in Dynamic Environment[J]. Journal of Southwest Jiaotong University, 2009, 44(6). DOI: 10.3969/j.issn.0258-2724.2009.06.014
Authors:SHEN Yuanxia  WANG Guoyin
Abstract:It is difficult for reinforcement learning to balance between the exploration of untested actions and the exploitation of known optimum actions in dynamic environment. To address this problem, a data-driven Q-learning algorithm was proposed. In this algorithm, the information system ofbehavior is constructed for each agent. Then the trigger mechanism of environment is build by the uncertainty of knowledge in the information system of behavior to trace the environmental change. The dynamic information of the environment is used to exploit new environment by the trigger mechanism to achieve the balance between the exploration of untested actions and the exploitation of know optimum actions. The proposed algorithm was applied to grid-world navigation tasks. The simulation results show that compared with the Q-learning, simulated annealing Q-learning (SAQ) and recency-based exploration (RBE) Q-learning algorithms, the proposed algorithm has a high learning efficiency.
Keywords:reinforcement learning  data-driving  Q-learning  uncertainty
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号