首页    期刊浏览 2024年07月19日 星期五
登录注册

文章基本信息

  • 标题:String Indexing for Top-k Close Consecutive Occurrences
  • 本地全文:下载
  • 作者:Philip Bille ; Gørtz, Inge Li ; Max Rishøj Pedersen
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2020
  • 卷号:182
  • 页码:1-17
  • DOI:10.4230/LIPIcs.FSTTCS.2020.14
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string P, report all occurrences of P within S. In this paper, we study a basic and natural extension of string indexing called the string indexing for top-k close consecutive occurrences problem (Sitcco). Here, a consecutive occurrence is a pair (i,j), i < j, such that P occurs at positions i and j in S and there is no occurrence of P between i and j, and their distance is defined as j-i. Given a pattern P and a parameter k, the goal is to report the top-k consecutive occurrences of P in S of minimal distance. The challenge is to compactly represent S while supporting queries in time close to the length of P and k. We give two time-space trade-offs for the problem. Let n be the length of S, m the length of P, and ε â^^ (0,1]. Our first result achieves O(nlog n) space and optimal query time of O(m k), and our second result achieves linear space and query time O(m k^{1 ε}). Along the way, we develop several techniques of independent interest, including a new translation of the problem into a line segment intersection problem and a new recursive clustering technique for trees.
  • 关键词:String indexing; pattern matching; consecutive occurrences
国家哲学社会科学文献中心版权所有