期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2012
卷号:3
期号:2
页码:3413-3418
出版社:TechScience Publications
摘要:The Map-reduce has become one of the inevitable programming framework for developing distributed data storage and information retrieval (IR) [1]. Efficient method for mining data and its fast retrieval has become the key concern over years. Various indexing mechanisms have been developed in Hadoop Map-reduce framework, an open-source implementation of Google. The framework consist of two basic functions- the map() function which partition the input into smaller sub-problems and distribute them to worker nodes, the reduce() function which aggregate the sub-outputs from the worker nodes to retrieve the final output. Map-reduce possess certain benefits compared to traditional file system viz locality optimization, very large computation and so on. Hadoop Distributed File System(HDFS) use B+ tree and various other indexing mechanisms where the storage and optimized retrieval of spatial data is an issue[2]. This paper provide an intuitive approach to incorporate Hilbert R tree and priority R tree, variants of R tree, for performing efficient indexing in a map-reduce framework. Priority tree can be considered as a hybrid between K-dimensional tree and R tree that define a given objects N-dimensional bounding volume as a point in N-dimensions, represented by ordered pair of rectangles enhancing quick Indexation [3]. Hilbert R tree, on other hand, can be thought as an extension to B+ tree for multi-dimensional object in spatial database achieving high degree of space utilization and good response time. This is done by proposing an ordering on R tree nodes by sorting rectangles according to Hilbert value of the center of rectangles. Given the ordering, every node has a welldefined set of sibling nodes. Thus, deferred splitting can be used. By adjusting the split policy, the Hilbert R tree can achieve high utilization as desired