期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2009
卷号:9
期号:5
页码:249-254
出版社:International Journal of Computer Science and Network Security
摘要:Data intensive applications that rely heavily on huge databases waste a lot of time in searching and retrieval especially if there is a single server retrieving data from the database. This paper proposes a Beowulf cluster for fast query processing by distributing the database horizontally over nodes through a load balancing act. A mathematical model is proposed to optimally partition data among the nodes. Communication between nodes is to be achieved through MPI(Message Passing Interface). A file system cache has been created to further decrease the query processing time. Caching is performed with the help of Apache Lucene API. Results would be retrieved depending upon a cache hit or miss. The size of the cache would be monitored and if it exceeds a threshold value deletion operation would be performed by applying the LRU(least recently used) algorithm. Through experimental results we have found that caching reduces the query processing time substantially. We can further improve the result by performing query optimization by indexing the attributes in complex queries. This approach has reduced the query processing time manifold as compared to a single overloaded server. With networks growing in speed and highly available secondary storage it is expected to perform even better in future.
关键词:Fast Query Processing; MPI; Load Balancing; File System Cache