首页    期刊浏览 2025年06月23日 星期一
登录注册

文章基本信息

  • 标题:A fast algorithm for constructing suffix arrays for DNA alphabets
  • 本地全文:下载
  • 作者:Zeinab Rabea ; Sara El-Metwally ; Samir Elmougy
  • 期刊名称:Journal of King Saud University @?C Computer and Information Sciences
  • 印刷版ISSN:1319-1578
  • 出版年度:2022
  • 卷号:34
  • 期号:7
  • 页码:4659-4668
  • 语种:English
  • 出版社:Elsevier
  • 摘要:The continuous improvement of sequencing technologies has been paralleled by the development of efficient algorithms and data structures for sequencing data analysis and processing. Suffix array is one of data structures that are used to construct the Burrows-Wheeler transform (BWT) for long length genomes. Building a suffix array itself is an expensive-resource process since the computations are dominant by sorting suffixes in a lexical order. Most of the suffix array construction algorithms consider the general and integer alphabets without utilizing special cases for fixed-size ones such as DNA alphabets. In this paper, we exploit the nature of four-sized DNA alphabets and utilize their predefined lexicographical ordering in order to construct suffix arrays for genomic data correctly and efficiently. The suffix array construction algorithm for DNA alphabets is evaluated using three real data sets with different lengths ranging from small E-coli genome to long length Homo sapiens GRCh38.p13 chromosomes. For long length genomes, their corresponding sequence is divided into parts (i.e. reads) with a minimum overlap length, the suffix array is computed for each part separately, and finally all partially computed arrays are merged together into a single one. We studied the effects of varying the reads/overlap lengths on the running time of the proposed suffix array construction algorithm and conclude that the minimum overlap length should be equal to the average length of the longest common prefix between the adjacent parts.
国家哲学社会科学文献中心版权所有