期刊名称:International Journal of Applied Mathematics and Computer Science
电子版ISSN:2083-8492
出版年度:2003
卷号:13
期号:3
出版社:De Gruyter Open
摘要:SNP sites are generally discovered by sequencing regions of the human genome in a limited number of individuals. This may leave SNP sites present in the region, but containing rare mutant nucleotides, undetected. Consequently, estimates of nucleotide diversity obtained from assays of detected SNP sites are biased. In this research we present a statistical model of the SNP discovery process, which is used to evaluate the extent of this bias. This model involves the symmetric Beta distribution of variant frequencies at SNP sites, with an additional probability that there is no SNP at any given site. Under this model of allele frequency distributions at SNP sites, we show that nucleotide diversity is always underestimated. However, the extent of bias does not seem to exceed 10–15% for the analyzed data. We find that our model of allele frequency distributions at SNP sites is consistent with SNP statistics derived based on new SNP data at ATM, BLM, RQL and WRN gene regions. The application of the theory to these new SNP data as well as to the literature data at the LPL gene region indicates that in spite of ascertainment biases, the observed differences of nucleotide diversity across these gene regions are real. This provides interesting evidence concerning the heterogeneity of the rates of nucleotide substitution across the genome
关键词:single nucleotide polymorphisms; ascertainment bias; nucleotide diversity; molecular evolution; BACK