期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2015
卷号:3
期号:8
DOI:10.15680/IJIRCCE.2015. 0308117
出版社:S&S Publications
摘要:The buzz word cloud computing provides a virtual area with authentication and supports the informationstorage and retrieval at hand. The cloud computing paradigm along with software tools such as Google’s MapReduceand Apache’s Hadoop MapReduce framework offer a response to the problem by distributing computations amonglarge sets of nodes.Its a open source for big data application. In many scenarios input data is geo-distributed across datacenters, and moving forwardly all data to a single data center before processing is expensive. This paper deals withexecuting sequences of MapReduce jobs on geo-distributed datasets.Analysis is done in all possible ways of executingsuch jobs, and propose data transformation graphs. Big-data refers to the very large-scale geographically distributeddata processing applications that operate on exceptionally large amounts of data.The MapReduce framework generatesa large amount of intermediate data. Such abundant information is thrown away after the tasks finish, becauseMapReduce is unable to utilize them. Dache acts as a cache memory in data centers to store the data temporarily oncloud. In this paper,Dache is proposed, which is a data-aware cache framework for big-data applications. In Dache,tasks submit their intermediate results to the cache manager. Before the computing work is being executed the cachemanager is raised with queries.
关键词:Cloud Computing; Big Data; Hadoop; Data center; MapReduce; Dache; Geo-distributed