摘要:Many sequential and temporal data have dependency relationships among their elements, which can be represented as a sequence of pointers. In this paper, we introduce a new string matching problem with a particular type of strings, which we call isodirectional pointer sequence, in which each entry has a pointer to another entry. The proposed problem is not only a formalization of real-world dependency matching problems, but also a generalization of variants of the string matching problem such as parameterized pattern matching and Cartesian tree matching. We present a 2nlgÏf 2n o(n)-bit index that preprocesses the text T[1:n] so as to count the number of occurrences of pattern P[1:m] in ð'ª(mlgÏf) where Ïf is the number of distinct lengths of pointers in T. Our index is also easily implementable in practice because it consists of wavelet trees and range maximum query index, which are widely used building blocks in many other compact data structures. By compressing the wavelet trees, the index can also be stored into 2nH^*â,(T) 2n o(n) bits where H^*â,(T) is the 0-th order empirical entropy of the distribution of pointer lengths of T.
关键词:String Matching; Suffix Array; FM-index; Wavelet Tree; Range Minimum Query; Parameterized String Matching; Cartesian Tree Matching