首页 | 本学科首页   官方微博 | 高级检索  
     检索      


The Cooperative Multi-agent Learning with Random Reward Values
Authors:ZHANG Hua-xiang  HUANG Shang-teng
Institution:Dept. of Computer Science and Eng. , Shanghai Jiaotong Univ. , Shanghai 200030, China
Abstract:This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents' rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents' immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.
Keywords:reinforcement learning  game  random reward
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号