文章基本信息

标题：Aligning Flowgrams to DNA Sequences
作者：Marcel Martin ; Sven Rahmann
期刊名称：OASIcs : OpenAccess Series in Informatics
电子版ISSN：2190-6807
出版年度：2013
卷号：34
页码：125-135
DOI：10.4230/OASIcs.GCB.2013.125
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of information by discarding fractional intensity information. We argue that base calling can be avoided entirely by directly aligning the flowgrams to DNA sequences. We introduce an algorithm for flowgram-string alignment based on dynamic programming, but covering more cases than standard local or global sequence alignment. We also propose a scoring scheme that takes into account sequence variations (from substitutions, insertions, deletions) and sequencing errors (flow intensities contradicting the homopolymer length) separately. This allows to resolve fractional intensities, ambiguous homopolymer lengths and editing events at alignment time by choosing the most likely read sequence given both the nucleotide intensities and the reference sequence. We provide a proof-of-concept implementation and demonstrate the advantages of flowgram-string alignment compared to base-called alignments.
关键词：flowgram; sequencing; alignment algorithm; scoring scheme