首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:Prefix-Free Parsing for Building Big BWTs
  • 本地全文:下载
  • 作者:Christina Boucher ; Travis Gagie ; Alan Kuhnle
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2018
  • 卷号:113
  • 页码:1-16
  • DOI:10.4230/LIPIcs.WABI.2018.2
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.
  • 关键词:Burrows-Wheeler Transform; prefix-free parsing; compression-aware algorithms; genomic databases
国家哲学社会科学文献中心版权所有