首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:FM-Index Reveals the Reverse Suffix Array
  • 本地全文:下载
  • 作者:Arnab Ganguly ; Daniel Gibney ; Sahar Hooshmand
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:161
  • 页码:13:1-13:14
  • DOI:10.4230/LIPIcs.CPM.2020.13
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:Given a text T[1,n] over an alphabet Σ of size Ïf, the suffix array of T stores the lexicographic order of the suffixes of T. The suffix array needs Î~(nlog n) bits of space compared to the n log Ïf bits needed to store T itself. A major breakthrough [FM - Index, FOCS'00] in the last two decades has been encoding the suffix array in near-optimal number of bits (â‰^ log Ïf bits per character). One can decode a suffix array value using the FM-Index in log^{O(1)} n time. We study an extension of the problem in which we have to also decode the suffix array values of the reverse text. This problem has numerous applications such as in approximate pattern matching [Lam et al., BIBM' 09]. Known approaches maintain the FM - Index of both the forward and the reverse text which drives up the space occupancy to 2nlog Ïf bits (plus lower order terms). This brings in the natural question of whether we can decode the suffix array values of both the forward and the reverse text, but by using nlog Ïf bits (plus lower order terms). We answer this question positively, and show that given the FM - Index of the forward text, we can decode the suffix array value of the reverse text in near logarithmic average time. Additionally, our experimental results are competitive when compared to the standard approach of maintaining the FM - Index for both the forward and the reverse text. We believe that applications that require both the forward and reverse text will benefit from our approach.
  • 关键词:Data Structures; Suffix Trees; String Algorithms; Compression; Burrows - Wheeler transform; FM-Index
国家哲学社会科学文献中心版权所有