期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:We learn a joint model of sentence extraction
and compression for multi-document summarization.
Our model scores candidate summaries
according to a combined linear model
whose features factor over (1) the n-gram
types in the summary and (2) the compressions
used. We train the model using a marginbased
objective whose loss captures end summary
quality. Because of the exponentially
large set of candidate summaries, we use a
cutting-plane algorithm to incrementally detect
and add active constraints efficiently. Inference
in our model can be cast as an ILP
and thereby solved in reasonable time; we also
present a fast approximation scheme which
achieves similar performance. Our jointly
extracted and compressed summaries outperform
both unlearned baselines and our learned
extraction-only system on both ROUGE and
Pyramid, without a drop in judged linguistic
quality. We achieve the highest published
ROUGE results to date on the TAC 2008 data
set.