摘要:Several worldwide pandemics, such as influenza, human immunodeficiency virus, and coronavirus, are caused by viral quasispecies. Characterization of quasispecies harboring in a host is essential to unveil the mechanisms that are at the base of the pathogen evolution, infection and spread at the epidemic level. Next generation sequencing (NGS) produces many thousands of sequence fragments from a single sample, allowing the full genome sequencing at high resolution. In this work, an original approach for the de novo assembly (reconstruction of a full genome without the need of a reference genome) of NGS reads into the quasispecies present in the sample is introduced, using biased random walks over an overlap graph construction. The proposed framework is shown to be successful in reconstructing viral quasispecies at different diversities, using both simulated and empirical data. In addition, a broad set of measures describing topological properties of the overlap graphs is examined, in order to highlight differences in the data sets and therefore in the population structures.
关键词:Next-generation sequencing; genome assembly; quasispecies; complex network; random walk; de novo assembly