文章基本信息

标题：Multi-armed Bandit Online Learning Based on POMDP in Cognitive Radio
本地全文：下载
作者：Juan Zhang ; Hesong-Jiang ; Hong Jiang 等
期刊名称：International Journal of Smart Home
印刷版ISSN：1975-4094
出版年度：2014
卷号：8
期号：3
页码：151-162
DOI：10.14257/ijsh.2014.8.3.14
出版社：SERSC
摘要：In cognitive radio, most of existing research efforts devoted to spectrum sharing have two weakness as follows. First, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of channel. Second, most of the studies are online learning based on perceived channel. To solve the above problems, a new algorithm is proposed in this paper: if the authorized user exists in the current channel, Second user will send conservatively in low rate, or send aggressively. When sending conservatively, the state of the channel is not directly observable, the problem turns out to be Partially Observable Markov Decision Process (POMDP).We first establish the optimal threshold when the channel is known, then consider the optimal transmission when the channel is unknown and model for multi-armed bandit. We get the optimal K-conservative policy through the UCB algorithm and improve the convergence speed by UCB-TUNED algorithm. Simulation and analysis results show that it is the same result of K-conservative policy no matter the multi- armed bandit online learning under not fully known channel or the optimal threshold policy under known channel .At the same time, we improve the convergence speed by UCB-TUNED algorithm
关键词：spectrum sharing; multi-armed bandit; online learning; Partially Observable ; Markov Decision Process