摘要:String matching is a problem of finding all occurrences of a short pattern on a relatively long reference string. While a number of methods have been presented, most published implementations assume several restrictions due to some practical issues. We focus on the restriction of the alphabet size, which is usually set to be 256 in many string matching libraries. When strings must be handled over an alphabet with a size greater than the limit provided by the given implementation, each character should be represented as a composite alphabet which involves combinations of two or more characters in the restricted alphabet set. In this case, potential false positives can sometimes occur, which may cause a decline in the performance in output-sensitive string matching systems, such as the FM-index. In this paper, we empirically compare various configurations of composite alphabets using FM-index, and show how they affect the performance in terms of the number of false positives and the searching time.