摘要:In cognitive radio, most of existing research efforts devoted to spectrum sharing have two weakness as follows. First, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of channel. Second, most of the studies are online learning based on perceived channel. To solve the above problems, a new algorithm is proposed in this paper: if the authorized user exists in the current channel, Second user will send conservatively in low rate, or send aggressively. When sending conservatively, the state of the channel is not directly observable, the problem turns out to be Partially Observable Markov Decision Process (POMDP).We first establish the optimal threshold when the channel is known, then consider the optimal transmission when the channel is unknown and model for multi-armed bandit. We get the optimal K-conservative policy through the UCB algorithm and improve the convergence speed by UCB-TUNED algorithm. Simulation and analysis results show that it is the same result of K-conservative policy no matter the multi- armed bandit online learning under not fully known channel or the optimal threshold policy under known channel .At the same time, we improve the convergence speed by UCB-TUNED algorithm