摘要:Summary Compressed pattern matching (CPM) refers to the task of locating all the occurrences of a pattern (or set of patterns) inside the body of compressed text. In this type of matching, pattern may or may not be compressed. CPM is very useful in handling large volume of data especially over the network. It has many applications in computational biology, where it is useful in finding similar trends in DNA sequences; intrusion detection over the networks, big data analytics etc. Various solutions have been provided by researchers where pattern is matched directly over the uncompressed text. Such solution requires lot of space and consumes lot of time when handling the big data. Various researchers have proposed the efficient solutions for compression but very few exist for pattern matching over the compressed text. Considering the future trend where data size is increasing exponentially day-by-day, CPM has become a desirable task. This paper presents a critical review on the recent techniques on the compressed pattern matching. The covered techniques includes: Word based Huffman codes, Word Based Tagged Codes; Wavelet Tree Based Indexing. We have presented a comparative analysis of all the techniques mentioned above and highlighted their advantages and disadvantages.
关键词:Pattern matching; Compressed pattern matching; Wavelet tree; Word based tagged code and big data;