期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2012
卷号:3
期号:3
页码:4553-4557
出版社:TechScience Publications
摘要:Retrieving information about famous personalities is a common task among internet users. Finding information from web search engines becomes difficult when those people are referred by other names on the web because information about people in the web pages exist using their alias names. So by just giving the real name in search won’t retrieve all the alias related information. This is the referential ambiguity problem. For this reason precise identification of aliases of a given person is important in many tasks such as information retrieval, identification of relations among entities, sentiment analysis, name disambiguation and semantic annotation related to web. The previous approaches extracted aliases for a given person which resulted in achieving a high mean reciprocal rank (MRR) and an improvement in recall. In order to achieve a good improvement in the MRR and recall compared to the previous approach we propose a system which extracts aliases by not only considering the first order co-occurrences but also the higher order co-occurrences among the anchor texts for a given name and alias which will help in the expansion of a query for retrieval of relevant results. This method will rank the aliases retrieved based on the different statistics scores calculated for a name and its corresponding alias in the anchor texts retrieved. The cooccurrences order will be known by constructing and mining an anchor text graph for a particular name and its associated aliases. We use two data sets person names and location names and for ranking the aliases we use a ranking support vector machine
关键词:Information Retrieval; Referential Ambiguity;Anchor Text Co-occurrence Graph; Web crawler; Graph;Mining-excel