期刊名称:International Journal of Software Engineering and Its Applications
印刷版ISSN:1738-9984
出版年度:2013
卷号:7
期号:3
出版社:SERSC
摘要:Because of the fast growing volume of web documents during the past decades, the efficiency of the web search engine has become more crucial than ever. Such efficiency can be estimated with both factors of the query relevance of search results answered and the financial cost for query processing. Between them, the ways for improving query relevance of web searches have been intensively studied in the research topics like hyperlink-based ranking, topic-sensitive document classifications, and semantic-awareness in rank evaluations. However, there have been not studies that provide an efficient solution to cut the financial cost of query processing, while retaining high query relevance. In this light, we propose a distributed cache scheme and a server-clustering technique that can be used to reduce the query processing cost. With the help of such techniques for accelerating the web query processing, we saved around 70% of the server cost of a commercial web search engine implemented in South Korea. We believe that our experiences can give a valuable insight to anyone who wants to develop a large-scale search engine.
关键词:Web search engine; distributed system; cache scheme; inverted index files