首页    期刊浏览 2025年09月19日 星期五
登录注册

文章基本信息

  • 标题:Integration of Spark into Splunk for High-Scale Datasets to augment Performance, Scalability, Liability and Strength
  • 本地全文:下载
  • 作者:Rohini More ; Smita Konda
  • 期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
  • 印刷版ISSN:2347-6710
  • 电子版ISSN:2319-8753
  • 出版年度:2017
  • 卷号:6
  • 期号:6
  • 页码:11194
  • DOI:10.15680/IJIRSET.2017.0606200
  • 出版社:S&S Publications
  • 摘要:The past few years have seen huge awareness in large-scale data analysis. As we are aware about thefact that data volumes in both industry and research are keep on growing. So we need high processing individualmachines. Hence to handle such large clusters of information Google’s MapReduce model and it’s open sourceimplementation, Hadoop is used. We used various Hadoop parallel data analysis tools such as Apache’s Hive & Pigengines for SQL processing & to handle large clusters. Though, these tools have been optimized for one pass batchprocessing of on-disk data, which makes them time-consuming for interactive data discovery and for more complexmultipass analytics algorithms that are becoming ordinary. In this article, we are introducing Spark & Splunk. Spark- anew cluster computing framework that can execute applications up to 40× faster as compared to hadoop. It keeps datain memory and can be used interactively to query huge datasets with sub-second latency. Splunk- a platform formachine data. It collects, indexes, and harnesses machine data generated by any IT system and infrastructure—whetherit’s virtual, physical or in the cloud. Splunk laid its foundation helping IT for finding and fixing problems faster. So inthis article we will discuss how to make use of Splunk with spark for performing analysis and machine learning ondata.
  • 关键词:Apache Spark; Splunk; RDD; machine data.
国家哲学社会科学文献中心版权所有