首页    期刊浏览 2024年09月22日 星期日
登录注册

文章基本信息

  • 标题:A Basic Frame-Work for Building Web Based Small Documents
  • 本地全文:下载
  • 作者:Yakkala V Krishna Teja ; M.Veera Kumari
  • 期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
  • 印刷版ISSN:2277-6451
  • 电子版ISSN:2277-128X
  • 出版年度:2012
  • 卷号:2
  • 期号:8
  • 出版社:S.S. Mishra
  • 摘要:My work introduces a hidden topic-based framework for processing short andsparse documents (e.g., searchresult snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The frameworkfocuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2)synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter arebig linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of theframework is that common hidden topics discovered from large external data sets (universal data sets), when included,can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets helphandle unseen data better. The proposed framework can also be applied for different natural languages and datadomains. We carefully evaluated the framework by carrying out two experiments for two important online applications(Web search result classification and matching/ranking for contextual advertising) with large-scale universal data setsand we achieved significant results.
  • 关键词:Webmining;hidden topics; classification;sparse data
国家哲学社会科学文献中心版权所有