期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:Scoring sentences in documents given abstract
summaries created by humans is important
in extractive multi-document summarization.
In this paper, we formulate extractive
summarization as a two step learning
problem building a generative model
for pattern discovery and a regression
model for inference. We calculate scores
for sentences in document clusters based
on their latent characteristics using a hierarchical
topic model. Then, using these
scores, we train a regression model based
on the lexical and structural characteristics
of the sentences, and use the model to
score sentences of new documents to form
a summary. Our system advances current
state-of-the-art improving ROUGE scores
by 7%. Generated summaries are less
redundant and more coherent based upon
manual quality evaluations.