文章基本信息

标题：Integration of Spark into Splunk for High-Scale Datasets to augment Performance, Scalability, Liability and Strength
本地全文：下载
作者：Rohini More ; Smita Konda
期刊名称：International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN：2347-6710
电子版ISSN：2319-8753
出版年度：2017
卷号：6
期号：6
页码：11194
DOI：10.15680/IJIRSET.2017.0606200
出版社：S&S Publications
摘要：The past few years have seen huge awareness in large-scale data analysis. As we are aware about thefact that data volumes in both industry and research are keep on growing. So we need high processing individualmachines. Hence to handle such large clusters of information Google’s MapReduce model and it’s open sourceimplementation, Hadoop is used. We used various Hadoop parallel data analysis tools such as Apache’s Hive & Pigengines for SQL processing & to handle large clusters. Though, these tools have been optimized for one pass batchprocessing of on-disk data, which makes them time-consuming for interactive data discovery and for more complexmultipass analytics algorithms that are becoming ordinary. In this article, we are introducing Spark & Splunk. Spark- anew cluster computing framework that can execute applications up to 40× faster as compared to hadoop. It keeps datain memory and can be used interactively to query huge datasets with sub-second latency. Splunk- a platform formachine data. It collects, indexes, and harnesses machine data generated by any IT system and infrastructure—whetherit’s virtual, physical or in the cloud. Splunk laid its foundation helping IT for finding and fixing problems faster. So inthis article we will discuss how to make use of Splunk with spark for performing analysis and machine learning ondata.
关键词：Apache Spark; Splunk; RDD; machine data.