首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Combining Lexical and Syntactic Features for Detecting Content-Dense Texts in News
  • 本地全文:下载
  • 作者:Yinfei Yang ; Ani Nenkova
  • 期刊名称:Journal of Artificial Intelligence Research
  • 印刷版ISSN:1076-9757
  • 出版年度:2017
  • 卷号:60
  • 页码:179-219
  • 出版社:American Association of Artificial
  • 摘要:Content-dense news report important factual information about an event in direct, succinct manner. Information seeking applications such as information extraction, question answering and summarization normally assume all text they deal with is content-dense. Here we empirically test this assumption on news articles from the business, U.S. international relations, sports and science journalism domains. Our findings clearly indicate that about half of the news texts in our study are in fact not content-dense and motivate the development of a supervised content-density detector. We heuristically label a large training corpus for the task and train a two-layer classifying model based on lexical and unlexicalized syntactic features. On manually annotated data, we compare the performance of domain-specific classifiers, trained on data only from a given news domain and a general classifier in which data from all four domains is pooled together. Our annotation and prediction experiments demonstrate that the concept of content density varies depending on the domain and that naive annotators provide judgement biased toward the stereotypical domain label. Domain-specific classifiers are more accurate for domains in which content-dense texts are typically fewer. Domain independent classifiers reproduce better naive crowdsourced judgements. Classification prediction is high across all conditions, around 80%.
国家哲学社会科学文献中心版权所有