期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:1
页码:127-132
出版社:TechScience Publications
摘要:In today’s age of information technology processing data is a very important issue. Nowadays even terabytes and petabytes of data is not sufficient for storing large chunks of database. The data is too big, moves too fast, or doesn’t fit the structures of the current database architectures. Big Data is typically large volume of un-structured and structured data that gets created from various organized and unorganized applications, activities such as emails web logs, Facebook, etc. The main difficulties with Big Data include capture, storage, search, sharing, analysis, and visualization. Hence companies today use concept called Hadoop in their applications. Even sufficiently large amount of data warehouses are unable to satisfy the needs of data storage. Hadoop is designed to store large amount of data sets reliably. It is an open source software which supports parallel and distributed data processing. Along with reliability and scalability features Hadoop also provide fault tolerance mechanism by which system continues to function correctly even after some components fails working properly. Fault tolerance is mainly achieved using data duplication and making copies of same data sets in two or more data nodes. MapReduce is a programming model and an associated implementation for processing and generating large datasets that is flexible to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines.