首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Fast matching statistics in small space
  • 作者:Djamal Belazzougui ; Fabio Cunial ; Olgert Denas
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2018
  • 卷号:103
  • 页码:17:1-17:14
  • DOI:10.4230/LIPIcs.SEA.2018.17
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.
  • 关键词:Matching statistics; maximal repeat; Burrows-Wheeler transform; wavelet tree; suffix tree topology
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有