期刊名称:International Journal of Computer Science & Information Technology (IJCSIT)
印刷版ISSN:0975-4660
电子版ISSN:0975-3826
出版年度:2015
卷号:7
期号:5
页码:1
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities ofconventional software and hardware. Hadoop framework distributes large datasets over multiplecommodity servers and performs parallel computations. We discuss the I/O bottlenecks of Hadoopframework and propose methods for enhancing I/O performance. A proven approach is to cache data tomaximize memory-locality of all map tasks. We introduce an approach to optimize I/O, the in-nodecombining design which extends the traditional combiner to a node level. The in-node combiner reducesthe total number of intermediate results and curtail network traffic between mappers and reducers.
关键词:Big Data;Hadoop;MapReduce;NoSQL; Data Management.