摘要:In a genome-wide association study with more than 100, 000 (100K) to 1 million single nucleotide polymorphisms (SNPs), the first step is usually a genome-wide scan to identify candidate chromosome regions for further analyses. The goal of the genome-wide scan is to rank all the SNPs based on their association tests or p-values and select the top SNPs. A good ranking procedure ranks the SNPs with true associations as near to the top as possible. This enhances the probability of selecting at least one SNP with a true association. However, if the disease-associated SNPs have moderate genetic effects, the probability that a large number of null SNPs will have extremely small p-values (or large test statistics) is high when screening more than 300K SNPs. Therefore, when selecting a small fraction of top SNPs (usually less than 5%), the probability of selecting at least one SNP with a true association is usually less than 80% unless the sample size is large. Robust statistics have been proposed to rank all the SNPs (e.g., MAX3 and MIN2). In this article we consider genome-wide scans with a genetic model selection and compare this proposed method to the existing approaches. Results from simulation studies are presented.
关键词:case-control design; efficiency robustness; genetic model selection; genome-wide studies; MAX