文章基本信息

标题：Clustered Distributed Index for Efficient Text Retrieval Using Threads
本地全文：下载
作者：M. Basavaraju ; R. Prabhakar
期刊名称：International Journal of Grid Computing & Applications
印刷版ISSN：2229-3949
电子版ISSN：0976-9404
出版年度：2010
卷号：1
期号：2
DOI：10.5121/ijgca.2010.12011
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：In this research paper, a novel method of improving the clustered distributed indices for efficient text retrieval using threads is presented. In text retrieval, text search refers to a technique of searching stored document or database. In a full text search, the search engine examines all the words in every stored document as it tries to match search words supplied by the user. When dealing with a small number of documents, the full-text search engine performs a serial scan, where it directly scans the contents of the documents with each query. When the number of documents to search is potentially large or the quantity of search queries to perform is substantial, the problem of full text search is often divided into two tasks, viz., indexing and searching. The indexing stage scans for text of all the documents and builds a list of search terms, often called an index. In the search stage, when performing a specific query, only the index is referenced rather than the text of the original documents. Considering all the above mentioned criterias, this paper aims at improving the search time on the index, by clustering the index. Threads are used to perform a parallel search on each of these clusters. The algorithm developed in C has been tested on various sizes of data and queries and compared with the sequential search method. The depicted results shown in the result section clearly show that this approach improves the search time significantly & the method proposed shows the efficacy, effectiveness, which can be further implemented for real time applications
关键词：Clustering; Distributed index; Threads; Text retrieval; Posting list; Query processing; Algorithms; ;Performance