首页    期刊浏览 2024年07月06日 星期六
登录注册

文章基本信息

  • 标题:A Web Page Summarization for Mobile Phones
  • 本地全文:下载
  • 作者:Takaaki Hasegawa ; Hitoshi Nishikawa ; Kenji Imamura
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2010
  • 卷号:25
  • 期号:1
  • 页码:133-143
  • DOI:10.1527/tjsai.25.133
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:Recently, web pages for mobile devices are widely spread on the Internet and a lot of people can access web pages through search engines by mobile devices as well as personal computers. A summary of a retrieved web page is important because the people judge whether or not the page would be relevant to their information need according to the summary. In particular, the summary must be not only compact but also grammatical and meaningful when the users retrieve information using a mobile phone with a small screen. Most search engines seem to produce a snippet based on the keyword-in-context (KWIC) method. However, this simple method could not generate a refined summary suitable for mobile phones because of low grammaticality and content overlap with the page title. We propose a more suitable method to generate a snippet for mobile devices using sentence extraction and sentence compression methods. First, sentences are biased based on whether they include the query terms from the users or words that are relevant to the queries, as well as whether they do not overlap with the page title based on maximal marginal relevance (MMR). Second, the selected sentences are compressed based on their phrase coverage, which is measured by the scores of words, and their phrase connection probability measured based on the language model, according to the dependency structure converted from the sentence. The experimental results reveal the proposed method outperformed the KWIC method in terms of relevance judgment, grammaticality, non-redundancy and content coverage.
  • 关键词:summarization ; sentence extraction ; sentence compression ; snippets ; mobile phones
国家哲学社会科学文献中心版权所有