首页    期刊浏览 2025年07月18日 星期五
登录注册

文章基本信息

  • 标题:Rapid Induction of Multiple Taxonomies for Enhanced Faceted Text Browsing
  • 本地全文:下载
  • 作者:Lawrence Muchemi ; Gregory Grefenstette
  • 期刊名称:International Journal of Artificial Intelligence & Applications (IJAIA)
  • 印刷版ISSN:0976-2191
  • 电子版ISSN:0975-900X
  • 出版年度:2016
  • 卷号:7
  • 期号:4
  • 页码:1
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:In this paper we present and compare two methodologies for rapidly inducing multiple subject-specifictaxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequencymethod for building the taxonomy, while the second involves the bootstrapping of a Word2Vec basedalgorithm with a directed crawler. We exploit the multilingual open-content directory of the World WideWeb, DMOZ1to seed the crawl, and the domain name to direct the crawl. This domain corpus is then inputto our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchicalsemantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semanticsproject, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal informationthrough faceted searching. We also perform a comprehensive corpus based evaluation of the algorithmsbased on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show thatthe induced taxonomies are of high quality.
  • 关键词:Taxonomy; Automatic Taxonomy Induction; Word2vec; Distributional Semantics; Web-crawl; Facetedsearch;Personal semantics data
国家哲学社会科学文献中心版权所有