首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:SOFT COMPUTING BASED IDENTIFICATION AND ASSESSMENT OF POTENTIAL DNA BARCODES OF SOLANACEOUS SPECIES USING cpDNA SEQUENCES
  • 本地全文:下载
  • 作者:Bhupinder Pal Singh ; Ajay Kumar ; Harpreet Singh
  • 期刊名称:Indian Journal of Computer Science and Engineering
  • 印刷版ISSN:2231-3850
  • 电子版ISSN:0976-5166
  • 出版年度:2021
  • 卷号:12
  • 期号:3
  • 页码:641-652
  • DOI:10.21817/indjcse/2021/v12i3/211203126
  • 出版社:Engg Journals Publications
  • 摘要:DNA barcoding (a technique that uses short DNA sequences) has become fast, economic and accurate method for discovering and identifying organisms of the three main kingdoms of eukaryotes. In plants, few coding and non coding regions of chloroplast genomes have been tested for their ability to identify species while other regions of genome are still left to be explored for their suitability as DNA barcodes. The present study is about identification of potential DNA barcodes and assessing their potential to discriminate 133 plant species belonging to family Solanaceae from chloroplast DNA (cpDNA) sequences using different machine learning classification algorithms in WEKA and distance based method in SPIDER. Thirty three hyper-variable regions were identified based on nucleotide diversity (π) using sliding window analysis of aligned file of these species. These regions along with well established markers (matK and rbcL) were assessed for their discriminating potential at genus level. Sequence richness regime was followed for six hyper-variable regions ‘ycf1’, ‘cemA, cemA-petA’, ‘rps12-clpP, clpP / rps12-psbB’, ‘petA, petA-psbJ, psbJ, psbJ-psbL’, ‘trnL-trnF, trnF, trnF-ndhJ’ and ‘ndhF, ndhF-rpl32, rpl32, rpl32-trnL’ using BLASTN along with matK and rbcL and were tested for their discrimination potential at genus and species levels. Distance based method SPIDER and machine learning algorithm SMO performed best when compared with other classification methods. It was observed from the study that with increase in number of sequences from particular species, there is increase in percentage correct identification rates. All hypervariable regions were able to achieve maximum percentage of correct identification rate (100%) at genus level. However region ‘ndhF, ndhF-rpl32, rpl32, rpl32-trnL’ was able to achieve highest discrimination rate of 69% at species level which was even better than matK and rbcL. The low identification rates at species level as compared to genus level were attributed to ambiguity within species for these regions. This study will provide valuable resource for development of DNA barcodes for Solanaceae family.
  • 关键词:DNA barcodes; Solanaceae; Machine learning algorithms; SPIDER
国家哲学社会科学文献中心版权所有