摘要:Gene Set Analysis (GSA) is a framework for testing the association of a set of genes and the outcome, e.g. disease status or treatment group. The method replies on computing a maxmean statistic and estimating the null distribution of the maxmean statistics via a restandardization procedure. In practice, the pre-determined gene sets have stronger intra-correlation than genes across sets. This may result in biases in the estimated null distribution. We derive an asymptotic null distribution of the maxmean statistics based on sparsity assumption. We propose a flexible two group mixture model for the maxmean statistics. The mixture model allows us to estimate the null parameters empirically via maximum likelihood approach. Our empirical method is compared with the restandardization procedure of GSA in simulations. We show that our method is more accurate in null density estimation when the genes are strongly correlated within gene sets.
关键词:Gene Set Analysis;Maxmean;Empirical Null;Mixture Model