首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:A Survey on Data Compression Methods for Biological Sequences
  • 本地全文:下载
  • 作者:Morteza Hosseini ; Diogo Pratas
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2016
  • 卷号:7
  • 期号:4
  • 页码:56
  • DOI:10.3390/info7040056
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:The ever increasing growth of the production of high-throughput sequencing data poses a serious challenge to the storage, processing and transmission of these data. As frequently stated, it is a data deluge. Compression is essential to address this challenge—it reduces storage space and processing costs, along with speeding up data transmission. In this paper, we provide a comprehensive survey of existing compression approaches, that are specialized for biological data, including protein and DNA sequences. Also, we devote an important part of the paper to the approaches proposed for the compression of different file formats, such as FASTA, as well as FASTQ and SAM/BAM, which contain quality scores and metadata, in addition to the biological sequences. Then, we present a comparison of the performance of several methods, in terms of compression ratio, memory usage and compression/decompression time. Finally, we present some suggestions for future research on biological data compression.
  • 关键词:protein sequence; DNA sequence; reference-free compression; reference-based compression; FASTA; Multi-FASTA; FASTQ; SAM; BAM protein sequence ; DNA sequence ; reference-free compression ; reference-based compression ; FASTA ; Multi-FASTA ; FASTQ ; SAM ; BAM
国家哲学社会科学文献中心版权所有