The Cooperative Multi-agent Learning with Random Reward Values期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

The Cooperative Multi-agent Learning with Random Reward Values

Authors:	ZHANG Hua-xiang HUANG Shang-teng

Institution:	Dept. of Computer Science and Eng. , Shanghai Jiaotong Univ. , Shanghai 200030, China

Abstract:	This paper investigated how to learn the optimal action policies in cooperative multi-agent systems if the agents' rewards are random variables, and proposed a general two-stage learning algorithm for cooperative multi-(agent) decision processes. The algorithm first calculates the averaged immediate rewards, and considers these learned rewards as the agents' immediate action rewards to learn the optimal action policies. It is proved that the learning algorithm can find the optimal policies in stochastic environment. Extending the algorithm to stochastic Markov decision processes was also discussed.

Keywords:	reinforcement learning game random reward
本文献已被 CNKI 维普万方数据等数据库收录！