首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Spalter: A Meta Machine Learning Approach to Distinguish True DNA Variants from Sequencing Artefacts
  • 本地全文:下载
  • 作者:Till Hartmann ; Sven Rahmann
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2018
  • 卷号:113
  • 页码:1-8
  • DOI:10.4230/LIPIcs.WABI.2018.13
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Being able to distinguish between true DNA variants and technical sequencing artefacts is a fundamental task in whole genome, exome or targeted gene analysis. Variant calling tools provide diagnostic parameters, such as strand bias or an aggregated overall quality for each called variant, to help users make an informed choice about which variants to accept or discard. Having several such quality indicators poses a problem for the users of variant callers because they need to set or adjust thresholds for each such indicator. Alternatively, machine learning methods can be used to train a classifier based on these indicators. This approach needs large sets of labeled training data, which is not easily available. The new approach presented here relies on the idea that a true DNA variant exists independently of technical features of the read in which it appears (e.g. base quality, strand, position in the read). Therefore the nucleotide separability classification problem - predicting the nucleotide state of each read in a given pileup based on technical features only - should be near impossible to solve for true variants. Nucleotide separability, i.e. achievable classification accuracy, can either be used to distinguish between true variants and technical artefacts directly, using a thresholding approach, or it can be used as a meta-feature to train a separability-based classifier. This article explores both possibilities with promising results, showing accuracies around 90%.
  • 关键词:variant calling; sequencing error; technical artefact; meta machine learning; classification
国家哲学社会科学文献中心版权所有