期刊名称:Proceedings of the National Academy of Sciences
印刷版ISSN:0027-8424
电子版ISSN:1091-6490
出版年度:2004
卷号:101
期号:29
页码:10523-10528
DOI:10.1073/pnas.0403564101
语种:English
出版社:The National Academy of Sciences of the United States of America
摘要:In this article, we address the problem of modeling generic features of structurally but not textually related DNA motifs, that is, motifs whose consensus sequences are entirely different but nevertheless share "metasequence features" reflecting similarities in the DNA-binding domains of their associated protein recognizers. We present MotifPrototyper, a profile Bayesian model that can capture structural properties typical of particular families of motifs. Each family corresponds to transcription regulatory proteins with similar types of structural signatures in their DNA-binding domains. We show how to train MotifPrototypers from biologically identified motifs categorized according to the TRANSFAC categorization of transcription factors and present empirical results of motif classification, motif parameter estimation, and de novo motif detection by using the learned profile models.
关键词:mixture model ; Dirichlet density ; hidden Markov model ; classification ; semi-unsupervised learning