期刊名称:The International Arab Journal of Information Technology
印刷版ISSN:1683-3198
出版年度:2013
卷号:10
期号:5
出版社:Zarqa Private University
摘要:Automatic thesaurus generation is used by search engines for query expansion. The same concept is used by search engine marketing companies to suggest keyword terms to their clients to improve the client’s ratings for different search engines. This paper presents and evaluates a corpus based method to find similar terms. The corpus is generated by scraping websites in different categories. A feature selection method is developed that rewards category specific terms and penalizes terms shared by two or more categories. The similarity measure is decomposed into three distinct components, namely contextual, functional and lexical similarities. The contextual similarity measure finds terms that are found in the same context. Functional similarity finds terms on co-occurrence basis while the lexically similar terms share one or more words. An overall similarity measure combines the evidence from these three measures
关键词:Information retrieval; text mining; term similarity; search engine marketing