期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2003
卷号:100
期号:23
页码:13167-13172
DOI:10.1073/pnas.1733249100
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes. There is a need to examine these data with statistical techniques to help discern possible patterns in the data. Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects. Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis. We develop a robust analysis method to address these problems. The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws. Our method requires a single pass and does not resort to complex "cleaning" or imputation of the data table before analysis. We illustrate the method with a commercial data set.