首页    期刊浏览 2024年08月31日 星期六
登录注册

文章基本信息

  • 标题:Indexing Compressed Text: A Tale of Time and Space (Invited Talk)
  • 本地全文:下载
  • 作者:Nicola Prezza
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:160
  • 页码:3:1-3:2
  • DOI:10.4230/LIPIcs.SEA.2020.3
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Text indexing is a classical algorithmic problem that has been studied for over four decades. The earliest optimal-time solution to the problem, the suffix tree [Weiner, 1973], dates back to 1973 and requires up to two orders of magnitude more space than the text to be stored. In the year 2000, two breakthrough works [Grossi and Vitter, 2000; Ferragina and Manzini, 2000] showed that this space overhead is not necessary: both the index and the text can be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: nowadays, the two most widely-used DNA aligners employ compressed indexes [Li and Durbin, 2009; Langmead et al., 2009]. In recent years, it became apparent that entropy had reached its limits: modern datasets (for example, collections of thousands of human genomes) are extremely large but very repetitive and, by its very definition, entropy cannot compress repetitive texts [S. Kreft and G. Navarro, 2013]. To overcome this problem, a new generation of indexes based on dictionary compressors (for example, LZ77 and run-length BWT) emerged [S. Kreft and G. Navarro, 2013; Gagie et al., 2020; F. Claude and G. Navarro, 2012], together with generalizations of the indexing problem to labeled graphs [Ferragina et al., 2009; Sirén et al., 2014; Travis Gagie et al., 2017]. This talk is a short and friendly survey of the landmarks of this fascinating path that took us from suffix trees to the most modern compressed indexes on labeled graphs.
  • 关键词:Compressed Text Indexing
国家哲学社会科学文献中心版权所有