摘要:In gene analysis, finding approximate tandem repeats in DNA sequence is an important issue. SUA_SATR is one of the latest methods for finding those repetitions, which suffers deficiencies of runtime cost and poor result quality. In order to detect approximate tandem repeats in genomic sequences more efficiently, we propose a new model based on a novel algorithm MSATR and an optimized algorithm m MSATR in this paper. The model uses the Motif-Divide method to improve the performance, which results in the proposal of algorithm MSATR. By introducing the definition of CASM to reduce the searching scope and optimizing the original mechanism adopted by MSATR, the mMSATR algorithm makes the detecting process more efficient and improves the result quality. The theoretical analysis and experiment results indicate that MSATR and m MSATR is able to get more results within less runtime. These algorithm s are superior to other methods in finding results, and it greatly reduces the runtime cost, which is of benefit when the gene data becomes larger.
关键词:DNA sequence mining;approximate tandem repeat;motif-similarity