文章基本信息

标题：A Hybrid Hierarchical Model for Multi-Document Summarization
本地全文：下载
作者：Asli Celikyilmaz ; Dilek Hakkani-Tur
期刊名称：Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度：2010
卷号：2010
出版社：ACL Anthology
摘要：Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by 7%. Generated summaries are less redundant and more coherent based upon manual quality evaluations.