出版社:Information and Media Technologies Editorial Board
摘要:Multi-document summarization is the task of generating a summary from multiple documents, and the generated summary is expected to contain much of the information contained in the original documents. Previous work tries to realize this by (i) formulating the task as the combinatorial optimization problem of simultaneously maximizing relevance and minimizing redundancy, or (ii) formulating the task as a graph-cut problem. This paper improves summary quality by combining these two approaches into a synthesized optimization problem that is formulated in Integer Linear Programming (ILP). Though an ILP problem can be solved with an ILP solver, the problem is NP-hard and it is difficult to obtain the exact solution in situations where immediate responses are needed. Our solution is to propose optimization heuristics that exploit Lagrangian relaxation to obtain good approximate solutions within feasible computation times. Experiments on the document understanding conference 2004 (DUC'04) dataset show that our Lagrangian relaxation based heuristics completes in feasible computation time but achieves higher ROUGE scores than state-of-the-art approximate methods.