文章基本信息

标题：FM-Index Reveals the Reverse Suffix Array
本地全文：下载
作者：Arnab Ganguly ; Daniel Gibney ; Sahar Hooshmand 等
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2020
卷号：161
页码：13:1-13:14
DOI：10.4230/LIPIcs.CPM.2020.13
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：Given a text T[1,n] over an alphabet Î£ of size Ïf, the suffix array of T stores the lexicographic order of the suffixes of T. The suffix array needs Î~(nlog n) bits of space compared to the n log Ïf bits needed to store T itself. A major breakthrough [FM - Index, FOCS'00] in the last two decades has been encoding the suffix array in near-optimal number of bits (â^ log Ïf bits per character). One can decode a suffix array value using the FM-Index in log^{O(1)} n time. We study an extension of the problem in which we have to also decode the suffix array values of the reverse text. This problem has numerous applications such as in approximate pattern matching [Lam et al., BIBM' 09]. Known approaches maintain the FM - Index of both the forward and the reverse text which drives up the space occupancy to 2nlog Ïf bits (plus lower order terms). This brings in the natural question of whether we can decode the suffix array values of both the forward and the reverse text, but by using nlog Ïf bits (plus lower order terms). We answer this question positively, and show that given the FM - Index of the forward text, we can decode the suffix array value of the reverse text in near logarithmic average time. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM - Index for both the forward and the reverse text. We believe that applications that require both the forward and reverse text will benefit from our approach.
关键词：Data Structures; Suffix Trees; String Algorithms; Compression; Burrows - Wheeler transform; FM-Index