期刊名称:International Journal of Hybrid Information Technology
印刷版ISSN:1738-9968
出版年度:2016
卷号:9
期号:1
页码:221-232
DOI:10.14257/ijhit.2016.9.1.19
出版社:SERSC
摘要:Fast and exact searching for sequences similar to a query sequence in genomic databases remains a challenging task in molecular biology. In this paper, the problem of finding all e-matches in a large genomic database is considered, i.e. all local alignments over a given length w and an error rate of at most e. A new database searching algorithm called QFLA is designed to solve this problem. The proposed algorithm is a full- sensitivity algorithm which is a refined q-gram filter and implemented on a q-gram index. First, new features are extracted from match-regions by logically partitioning both query sequence and genomic database. Second, a large part of irrelevant subsequences are eliminated quickly by these new features during the searching process. Last, the unfiltered regions are verified by the well-known smith-waterman algorithm. The experimental results demonstrate that our algorithm saves time by improving filtration efficiency in a short filtration time.
关键词:sequence comparison; local alignment; filter algorithm; q-gram filter; q- ; gram index