文章基本信息

标题：An Efficient Composite-Alphabet Transform for String Matching under a Restricted Alphabet Set
本地全文：下载
作者：Kim, Sung-Hwan ; Ock, Chang-Seok ; Cho, Hwan-Gue 等
期刊名称：Journal of Computers
印刷版ISSN：1796-203X
出版年度：2013
卷号：8
期号：7
页码：1804-1809
DOI：10.4304/jcp.8.7.1804-1809
语种：English
出版社：Academy Publisher
摘要：String matching is a problem of finding all occurrences of a short pattern on a relatively long reference string. While a number of methods have been presented, most published implementations assume several restrictions due to some practical issues. We focus on the restriction of the alphabet size, which is usually set to be 256 in many string matching libraries. When strings must be handled over an alphabet with a size greater than the limit provided by the given implementation, each character should be represented as a composite alphabet which involves combinations of two or more characters in the restricted alphabet set. In this case, potential false positives can sometimes occur, which may cause a decline in the performance in output-sensitive string matching systems, such as the FM-index. In this paper, we empirically compare various configurations of composite alphabets using FM-index, and show how they affect the performance in terms of the number of false positives and the searching time.
关键词：Alphabet Size Limit;Composite Alphabet;FM-index;Output Sensitive Function;String Matching