考虑抽样时间间隔的特殊单臂Bandit报酬过程 Special one-armed Bandit reward process considering random sampling times期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

考虑抽样时间间隔的特殊单臂Bandit报酬过程

引用本文：	邹捷中,梁友.考虑抽样时间间隔的特殊单臂Bandit报酬过程[J].铁道科学与工程学报,2006,3(6):87-90.

作者姓名：	邹捷中梁友

作者单位：	中南大学,数学科学与计算技术学院,湖南,长沙,410075

基金项目：	国家自然科学基金资助项目(10671212)

摘要：	应用动态规划向后归纳法和贝叶斯方法,研究了一类特殊单臂Bandit报酬过程的最优决策问题。在这个模型中,未知Bandit过程是抽样时间间隔服从负指数分布,抽样值服从Erlang(2)分布,允许在任意时刻跳转的Bandit报酬过程。讨论了这类Bandit报酬过程Gittins指数的单调性质,并在此基础上将包含这类过程的单臂Bandit报酬过程的最优决策问题简化为一个最优停止问题,构造了计算过程最优停止时间的算法。
关键词：	贝叶斯方法特殊单臂Bandit报酬过程 Gittins指数 Erlang(2)分布
文章编号：	1672-7029(2006)06-0087-04
修稿时间：	2006年9月1日
Special one-armed Bandit reward process considering random sampling times

ZOU Jie-zhong,LIANG You.Special one-armed Bandit reward process considering random sampling times[J].Journal of Railway Science and Engineering,2006,3(6):87-90.

Authors:	ZOU Jie-zhong LIANG You

Abstract:	The optimal decision problem of a special one-armed Bandit reward process was investigated by using dynamic programming backward induction and the Bayesian approach.This model includes an unknown Bandit reward process with unrestricted switching times,whose random sampling times have a negative exponential distribution and sampling values have an Erlang(2) distribution.The monotonousness of the Gittins index of this Bandit reward process was discussed.Based on the monotonousness,the optimal decision problem of a special one-armed Bandit reward process including this unknown process was simplified to an optimal stopping problem and an algorithm was constructed to compute the optimal stopping times.

Keywords:	Bayesian approach a special one-armed Bandit reward process Gittins index Erlang(2) distribution
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏