期刊名称:International Journal of Advances in Soft Computing and Its Applications
印刷版ISSN:2074-8523
出版年度:2019
卷号:11
期号:1
页码:94-111
出版社:International Center for Scientific Research and Studies
摘要:Genomic repeat, which is to find repeating base pairs inDeoxyribonucleic Acid (DNA) sequences, can be used to detectgenetic disease by analyzing the overload or over normal limits of therepetition. Since it takes very high computation cost, this researchbuilds a parallel-computing model and its implementation to solve it.It can be achieved by modifying and implementing the Knuth-Morris-Pratt algorithm (KMP) on the R High-Performance-Computing Package, namely ‘pbdMPI’. It contains the followingsteps: preprocessing and splitting DNA sequence, KMP on parallelcomputing with ‘pbdMPI’, combining all indices, and calculatinggenomic repeats. To validate the model and implementation, 114experiments involving human DNA sequences are conducted on thestandalone and parallel-computing scenarios. The results show thatthe proposed system can reduce the computation cost, which is morethan 100 times faster than the standalone computing. Somecomparisons of the computation cost in term of the numbers ofbatches and numbers of cores are presented along with the existingresearches. In summary, the proposed model provides the significantimprovement on the computational cost.
关键词:DNA; human genom; genomic repeats; string matching; Knuth-;Morris-Pratt; high-performance computing.