首页    期刊浏览 2024年07月09日 星期二
登录注册

文章基本信息

  • 标题:Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation
  • 本地全文:下载
  • 作者:Jhabindra Khanal ; Hilal Tayara ; Quan Zou
  • 期刊名称:Computational and Structural Biotechnology Journal
  • 印刷版ISSN:2001-0370
  • 出版年度:2021
  • 卷号:19
  • 页码:1612-1619
  • DOI:10.1016/j.csbj.2021.03.015
  • 出版社:Computational and Structural Biotechnology Journal
  • 摘要:DNA N4-methylcytosine (4mC), an epigenetic modification found in prokaryotic and eukaryotic species, is involved in numerous biological functions, including host defense, transcription regulation, gene expression, and DNA replication. To identify 4mC sites, previous computational studies mostly focused on finding hand-crafted features. This area of research, therefore, would benefit from the development of a computational approach that relies on automatic feature selection to identify relevant sites. We here report 4mC-w2vec, a computational method that learned automatic feature discrimination in the Rosaceae genomes, especially in Rosa chinensis (R. chinensis) and Fragaria vesca (F. vesca) , based on distributed feature representation and through the word embedding technique ‘word2vec’. While a few bioinformatics tools are currently employed to identify 4mC sites in these genomes , their prediction performance is inadequate. Our system processed 4mC and non-4mC sites through a word embedding process, including sub-word information of its biological words through k-mer, which then served as features that were fed into a double layer of convolutional neural network (CNN) to classify whether the sample sequences contained 4mCs or non-4mCs sites. Our tool demonstrated performance superior to current tools that use the same genomic datasets. Additionally, 4mC-w2vec is effective for balanced and imbalanced class datasets alike, and the online web-server is currently available at: http://nsclbio.jbnu.ac.kr/tools/4mC-w2vec/ .
  • 关键词:Sequence analysis ; DNA N4-methylcytosine (4mC) ; Word embedding ; Convolutional Neural Network ; Web-server
国家哲学社会科学文献中心版权所有