首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:Validating Paired-End Read Alignments in Sequence Graphs
  • 本地全文:下载
  • 作者:Chirag Jain ; Haowen Zhang ; Alexander Dilthey
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2019
  • 卷号:143
  • 页码:1-13
  • DOI:10.4230/LIPIcs.WABI.2019.17
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.
  • 关键词:Sequence graphs; read mapping; index; sparse matrix-matrix multiplication
国家哲学社会科学文献中心版权所有