首页    期刊浏览 2025年02月23日 星期日
登录注册

文章基本信息

  • 标题:Comparing MapReduce and Spark in Computing the PCC Matrix in Gene Co-expression Networks
  • 本地全文:下载
  • 作者:Nagwan Abdel Samee ; Nada Hassan Osman ; Rania Ahmed Abdel Azeem Abul Seoud
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2021
  • 卷号:12
  • 期号:9
  • DOI:10.14569/IJACSA.2021.0120937
  • 语种:English
  • 出版社:Science and Information Society (SAI)
  • 摘要:Correlation between gene expression profiles across multiple samples and the identification of inter-gene interactions is a critical technique for Co-expression networking. Due to the highly intensive processing of calculating the Pearson’s Correlation Coefficient, PCC, matrix, it often takes too much processing time to accomplish it. Therefore, in this work, Big Data techniques including MapReduce and Spark have been employed in a cloud environment to calculate the PCC matrix to find the dependencies between genes measured in high throughput microarray. A comparison between the running time of each phase in both of MapReduce and Spark approaches has been held. Both these techniques can dramatically speed up the computation allowing users to work with highly intensive processing. However, Spark has yielded a better performance than the MapReduce as it performs the processing in the main memory of the worker nodes and avoids the unnecessary I/O operations with the disks. Spark has yielded 80 times speed up for calculating the PCC of 22777 genes, however the MapReduce attained barely 8 times speed up.
  • 关键词:Pearson's correlation; Hadoop; MapReduce; spark; gene co-expression networks; GCN; Affymetrix microarrays
国家哲学社会科学文献中心版权所有