首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Performance evaluation of different machine learning algorithms in presence of outliers using gene expression data
  • 本地全文:下载
  • 作者:M Shahjaman ; MM Rashid ; MI Asifuzzaman
  • 期刊名称:Journal of Bio-Science
  • 印刷版ISSN:1023-8654
  • 出版年度:2020
  • 卷号:28
  • 页码:69-80
  • DOI:10.3329/jbs.v28i0.44712
  • 语种:English
  • 出版社:Institute of Biological Sciences, Rajshahi University
  • 摘要:Classification of samples into one or more populations is one of the main objectives of gene expression data (GED) analysis. Many machine learning algorithms were employed in several studies to perform this task. However, these studies did not consider the outliers problem. GEDs are often contaminated by outliers due to several steps involve in the data generating process from hybridization of DNA samples to image analysis. Most of the algorithms produce higher false positives and lower accuracies in presence of outliers, particularly for lower number of replicates in the biological conditions. Therefore, in this paper, a comprehensive study has been carried out among five popular machine learning algorithms (SVM, RF, Naïve Bayes, k-NN and LDA) using both simulated and real gene expression datasets, in absence and presence of outliers. Three different rates of outliers (5%, 10% and 50%) and six performance indices (TPR, FPR, TNR, FNR, FDR and AUC) were considered to investigate the performance of five machine learning algorithms. Both simulated and real GED analysis results revealed that SVM produced comparatively better performance than the other four algorithms (RF, Naïve Bayes, k-NN and LDA) for both small-and-large sample sizes.
  • 关键词:Classification; DE gene; GED; Outliers; Robustness
国家哲学社会科学文献中心版权所有