期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2020
卷号:98
期号:11
页码:1948-1956
出版社:Journal of Theoretical and Applied
摘要:Approximate multiple string matching comprises of finding all appearances of a set of patterns within a larger text with at most k-mismatches or k-differences. It is a process that consumes large amounts of computational resources and has applications in many domains of computer science such as correction of spelling, spam filtering and computational biology to name a few. However, these applications require a fast algorithm for string matching which depends on the construction of an accurate shift table. When a mismatch occurs while searching a pattern, a shift table specifies the maximum number of characters that can be skipped in text while ensuring that a possible match is not overlooked. This study presents a simple and efficient algorithm that is a variant of Tarhio-Ukkonen algorithm generalized to approximate multiple string matching. The algorithm searches for patterns of different lengths with at most k-mismatches by skipping some of the characters, while not missing the occurrences of any pattern. While the space and time complexity for construction of shift table is and , the time complexity of search algorithm is , where is the number of patterns, is the number of characters in the alphabet, is the length of the text string and is the total length of all patterns. To evaluate the efficacy of the algorithm, experiments are conducted on random text and random patterns of different lengths and different values of k.