文章基本信息

标题：A novel logistic regression model combining semi-supervised learning and active learning for disease classification
本地全文：下载
作者：Hua Chai ; Yong Liang ; Sai Wang 等
期刊名称：Scientific Reports
电子版ISSN：2045-2322
出版年度：2018
卷号：8
期号：1
页码：13009
DOI：10.1038/s41598-018-31395-5
语种：English
出版社：Springer Nature
摘要：Traditional supervised learning classifier needs a lot of labeled samples to achieve good performance, however in many biological datasets there is only a small size of labeled samples and the remaining samples are unlabeled. Labeling these unlabeled samples manually is difficult or expensive. Technologies such as active learning and semi-supervised learning have been proposed to utilize the unlabeled samples for improving the model performance. However in active learning the model suffers from being short-sighted or biased and some manual workload is still needed. The semi-supervised learning methods are easy to be affected by the noisy samples. In this paper we propose a novel logistic regression model based on complementarity of active learning and semi-supervised learning, for utilizing the unlabeled samples with least cost to improve the disease classification accuracy. In addition to that, an update pseudo-labeled samples mechanism is designed to reduce the false pseudo-labeled samples. The experiment results show that this new model can achieve better performances compared the widely used semi-supervised learning and active learning methods in disease classification and gene selection.