期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2012
卷号:35
期号:3
出版社:IEEE Computer Society
摘要:Many tasks in computational linguistics traditionally rely on hand-crafted or curated resources like the-sauri or word-sense-annotated corpora. The availability of big data, from the Web and other sources,has changed this situation. Harnessing these assets requires scalable methods for data and text ana-lytics. This paper gives an overview on our recent work that utilizes big data methods for enhancingsemantics-centric tasks dealing with natural language texts. We demonstrate a virtuous cycle in harvest-ing knowledge from large data and text collections and leveraging this knowledge in order to improvethe annotation and interpretation of language in Web pages and social media. Specifically, we show howto build large dictionaries of names and paraphrases for entities and relations, and how these help todisambiguate entity mentions in texts