摘要:In this paper, we propose a co-ranking algorithm that trains listwise ranking functions using unlabeled data simultaneously with a small number of labeled data. The co-ranking algorithm is based on the co-training paradigm that is a very common scheme in the semi-supervised classification framework. First, we use two listwise ranking methods to construct base ranker and assistant ranker, respectively, by learning from the current labeled set. Then we score documents of unlabeled query set by these rankers. For each newly labeled query, two ideal document permutations are obtained with different ranking functions. Thus, likelihood loss is employed to evaluate the similarity of two document permutations. At last we remove those queries having lower likelihood of document permutations from unlabeled set to labeled one. The former three steps are iterated until the ranking performance of base ranker begins to decrease on validation set. In this method, we assume that the unlabeled data follows the same generative distribution as the labeled data. The effectiveness of the presented co-ranking algorithm is demonstrated by experimental results on the benchmark datasets LETOR.
关键词:information retrieval;learning to rank;semi-supervised learning;unlabeled data;listwise;likelihood loss