期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2013
卷号:2
期号:7
页码:2252-2257
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Semantic similarity measures play an important role in Information Retrieval, Natural Language Processing and Web Mining applications such as community mining, relation detection, entity disambiguation and document clustering etc. This paper proposes Page Count and Snippets Method (PCSM) to estimate semantic similarity between any two words (or entities) based on page counts and text snippets retrieved from a web search engine. It defines five page count based concurrence measures and integrates them with lexical patterns extracted from text snippets. A lexical pattern extraction algorithm is proposed to identify the semantic relations that exist between any query word pair. Similarity score of both methods are integrated by using Support Vector Machine (SVM) to get optimal results. The proposed method is compared with Miller and Charles (MC) benchmark data sets and the performance is measured by using Pearson correlation value. The correlation value of proposed method is 0.8960% which is higher than existing methods. The PCSM also evaluates semantic relations between named entities to improve Precision, Recall and F-score.
关键词:Community Mining; Information Retrieval; ; Lexical Patterns; Page Counts; Text Snippets; Correlation