文章基本信息

标题：Internal Dictionary Matching
本地全文：下载
作者：Panagiotis Charalampopoulos ; Tomasz Kociumaka ; Manal Mohamed 等
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2019
卷号：149
页码：1-17
DOI：10.4230/LIPIcs.ISAAC.2019.22
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary D in fragments of a given string T of length n. The dictionary is internal in the sense that each pattern in D is given as a fragment of T. This way, D takes space proportional to the number of patterns d= D rather than their total length, which could be Theta(n * d). In particular, we consider the following types of queries: reporting and counting all occurrences of patterns from D in a fragment T[i..j] (operations Report(i,j) and Count(i,j) below, as well as operation Exists(i,j) that returns true iff Count(i,j)>0) and reporting distinct patterns from D that occur in T[i..j] (operation ReportDistinct(i,j)). We show how to construct, in O((n+d) log^{O(1)} n) time, a data structure that answers each of these queries in time O(log^{O(1)} n+ output ) - see the table below for specific time and space complexities. Query Preprocessing time Space Query time Exists(i,j) O(n+d) O(n) O(1) Report(i,j) O(n+d) O(n+d) O(1+ output ) ReportDistinct(i,j) O(n log n+d) O(n+d) O(log n+ output ) Count(i,j) O({n log n}/{log log n} + d log^{3/2} n) O(n+d log n) O({log^2n}/{log log n}) The case of counting patterns is much more involved and needs a combination of a locally consistent parsing with orthogonal range searching. Reporting distinct patterns, on the other hand, uses the structure of maximal repetitions in strings. Finally, we provide tight - up to subpolynomial factors - upper and lower bounds for the case of a dynamic dictionary.
关键词：string algorithms; dictionary matching; internal pattern matching