出版社:University of Malaya * Faculty of Computer Science and Information Technology
摘要:Traditional retrieval models were effective in the early stage of the Web; however, with the huge amount of information that is available on the Web today further optimization is required to enhance the performance of these models in extracting the most relevant information. Utilization of the term proximity is one of the techniques that have been introduced for this purpose by many researchers. It assumes that the words in the user query are correlated and thus proximity between them should be considered in the matching process. Densitybased proximity is an effectual type of term proximity measures which is still not fully considered in the retrieval models. In this paper we investigate the application of a recent densitybased measure called CrossTerms which has achieved significant scores when applied on the effective BM25 retrieval model. We applied crossterms on another effective retrieval model that is the Language Modeling Approach. The performance of the enhanced language model was measured and evaluated through several experiments and metrics. Experiments results show that the crossterms measure was able to improve the performance of the basic language model in all the applied evaluation metrics. Performance improvement reached (+4%) with the MAP metric and (+8%) with P@5 and P@20 metrics.
关键词:information retrieval; crossterms; kernel; proximity; language model