首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Adapting Decision Tree-Based Method to Index Large DNA-Protein Sequence Datasets
  • 本地全文:下载
  • 作者:Khalid Mohammad Jaber ; Rosni Abdullah ; Nur'Aini Abdul Rashid
  • 期刊名称:International Journal of ACM Jordan
  • 印刷版ISSN:2078-7952
  • 电子版ISSN:2078-7960
  • 出版年度:2011
  • 卷号:II
  • 期号:I
  • 出版社:ACM Jordan ISWSA Professional Chapter
  • 摘要:

    Abstract: Currently, the size of biological databases has increased significantly with the growing number of users and the rate of queries where some databases are of terabyte size. Hence, there is an increasing need to access databases at the fastest possible rate. Where biologists are concerned, the need is more of a means to fast, scalable and accuracy searching in biological databases. This may seem to be a simple task, given the speed of current available gigabytes processors. However, this is far from the truth as the growing number of data which are deposited into the database are ever increasing. Hence, searching the database becomes a difficult and time-consuming task. Here, the computer scientist can help to organize data in a way that allows biologists to quickly search existing information. In this paper, a decision tree indexing model for DNA and protein sequence datasets is proposed. This method of indexing can effectively and rapidly retrieve all the similar proteins from a large database for a given protein query. A theoretical and conceptual proposed framework is derived, based on published works using indexing techniques for different applications. After this, the methodology was proved by extensive experiments using 10 data sets with variant sizes for DNA and protein. The experimental results show that the proposed method reduced the searching space to an average of 97.9\% for DNA and 98\% for protein, compared to the Top Down Disk-based suffix tree methods currently in use. Furthermore, the proposed method was about 2.35 times faster for DNA and 29 times for protein compared to the BLAST+ algorithm, in respect of query processing time.

国家哲学社会科学文献中心版权所有