首页 | 本学科首页   官方微博 | 高级检索  
     检索      

考虑抽样时间间隔的特殊单臂Bandit报酬过程
引用本文:邹捷中,梁友.考虑抽样时间间隔的特殊单臂Bandit报酬过程[J].铁道科学与工程学报,2006,3(6):87-90.
作者姓名:邹捷中  梁友
作者单位:中南大学,数学科学与计算技术学院,湖南,长沙,410075
基金项目:国家自然科学基金资助项目(10671212)
摘    要:应用动态规划向后归纳法和贝叶斯方法,研究了一类特殊单臂Bandit报酬过程的最优决策问题。在这个模型中,未知Bandit过程是抽样时间间隔服从负指数分布,抽样值服从Erlang(2)分布,允许在任意时刻跳转的Bandit报酬过程。讨论了这类Bandit报酬过程Gittins指数的单调性质,并在此基础上将包含这类过程的单臂Bandit报酬过程的最优决策问题简化为一个最优停止问题,构造了计算过程最优停止时间的算法。

关 键 词:贝叶斯方法  特殊单臂Bandit报酬过程  Gittins指数  Erlang(2)分布
文章编号:1672-7029(2006)06-0087-04
修稿时间:2006年9月1日

Special one-armed Bandit reward process considering random sampling times
ZOU Jie-zhong,LIANG You.Special one-armed Bandit reward process considering random sampling times[J].Journal of Railway Science and Engineering,2006,3(6):87-90.
Authors:ZOU Jie-zhong  LIANG You
Abstract:The optimal decision problem of a special one-armed Bandit reward process was investigated by using dynamic programming backward induction and the Bayesian approach.This model includes an unknown Bandit reward process with unrestricted switching times,whose random sampling times have a negative exponential distribution and sampling values have an Erlang(2) distribution.The monotonousness of the Gittins index of this Bandit reward process was discussed.Based on the monotonousness,the optimal decision problem of a special one-armed Bandit reward process including this unknown process was simplified to an optimal stopping problem and an algorithm was constructed to compute the optimal stopping times.
Keywords:Bayesian approach  a special one-armed Bandit reward process  Gittins index  Erlang(2) distribution
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号