首页    期刊浏览 2025年05月24日 星期六
登录注册

文章基本信息

  • 标题:Robust predictions of specialized metabolism genes through machine learning
  • 作者:Bethany M. Moore ; Bethany M. Moore ; Peipei Wang
  • 期刊名称:Proceedings of the National Academy of Sciences
  • 印刷版ISSN:0027-8424
  • 电子版ISSN:1091-6490
  • 出版年度:2019
  • 卷号:116
  • 期号:6
  • 页码:2344-2353
  • DOI:10.1073/pnas.1817074116
  • 语种:English
  • 出版社:The National Academy of Sciences of the United States of America
  • 摘要:Plant specialized metabolism (SM) enzymes produce lineage-specific metabolites with important ecological, evolutionary, and biotechnological implications. Using Arabidopsis thaliana as a model, we identified distinguishing characteristics of SM and GM (general metabolism, traditionally referred to as primary metabolism) genes through a detailed study of features including duplication pattern, sequence conservation, transcription, protein domain content, and gene network properties. Analysis of multiple sets of benchmark genes revealed that SM genes tend to be tandemly duplicated, coexpressed with their paralogs, narrowly expressed at lower levels, less conserved, and less well connected in gene networks relative to GM genes. Although the values of each of these features significantly differed between SM and GM genes, any single feature was ineffective at predicting SM from GM genes. Using machine learning methods to integrate all features, a prediction model was established with a true positive rate of 87% and a true negative rate of 71%. In addition, 86% of known SM genes not used to create the machine learning model were predicted. We also demonstrated that the model could be further improved when we distinguished between SM, GM, and junction genes responsible for reactions shared by SM and GM pathways, indicating that topological considerations may further improve the SM prediction model. Application of the prediction model led to the identification of 1,220 A. thaliana genes with previously unknown functions, each assigned a confidence measure called an SM score, providing a global estimate of SM gene content in a plant genome.
  • 关键词:specialized metabolism ; machine learning ; predictive biology ; data integration
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有