文章基本信息

标题：Fast matching statistics in small space
作者：Djamal Belazzougui ; Fabio Cunial ; Olgert Denas 等
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2018
卷号：103
页码：17:1-17:14
DOI：10.4230/LIPIcs.SEA.2018.17
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：Computing the matching statistics of a string S with respect to a string T on an alphabet of size sigma is a fundamental primitive for a number of large-scale string analysis applications, including the comparison of entire genomes, for which space is a pressing issue. This paper takes from theory to practice an existing algorithm that uses just O(|T|log{sigma}) bits of space, and that computes a compact encoding of the matching statistics array in O(|S|log{sigma}) time. The techniques used to speed up the algorithm are of general interest, since they optimize queries on the existence of a Weiner link from a node of the suffix tree, and parent operations after unsuccessful Weiner links. Thus, they can be applied to other matching statistics algorithms, as well as to any suffix tree traversal that relies on such calls. Some of our optimizations yield a matching statistics implementation that is up to three times faster than a plain version of the algorithm, depending on the similarity between S and T. In genomic datasets of practical significance we achieve speedups of up to 1.8, but our fastest implementations take on average twice the time of an existing code based on the LCP array. The key advantage is that our implementations need between one half and one fifth of the competitor's memory, and they approach comparable running times when S and T are very similar.
关键词：Matching statistics; maximal repeat; Burrows-Wheeler transform; wavelet tree; suffix tree topology