文章基本信息

标题：Extraction Based Multi Document Summarization using Single Document Summary Cluster
本地全文：下载
作者：Shanmugasundaram Hariharan
期刊名称：International Journal of Advances in Soft Computing and Its Applications
印刷版ISSN：2074-8523
出版年度：2010
卷号：2
期号：1
出版社：International Center for Scientific Research and Studies
摘要：Multi document summarization has very great impact among research community, ever since the growth of online information and availability. Selecting most important sentences from such huge repository of data is quiet tricky and challenging task. While multi document poses some additional overhead in sentence selection, generating summaries for each individual documents and merging the sentences in a coherent order would greater strength. The proposed approach was competitively better as compared to state of MEAD summarizer at focused compression ratios. This paper focus on three different studies namely i. To find the performance of multi document summarizer from single document cluster (using MEAD) ii. Comparison of our approach with MEAD performance for the dataset considered iii. To extract sentences for multi document summarization at 30% compression rate to obtain 100% efficiency using 7-point summary sheet. Investigation carried out from an average of 22 documents shows that our system is promising
关键词：single document summarization; sentence extraction; multi ;document summarization; MEAD. ; var currentpos;timer; function initialize() { timer=setInterval("scrollwindow()";10);} function sc(){clearInterval(timer); }function scrollwindow() { currentpos=document.body.scrollTop; window.scroll(0;++currentpos); if (currentpos != document.body.scrollTop) sc();} document.onmousedown=scdocument.ondblclick=initialize;S.Hariharan 2 ;1 Introduction ;Summarization is a reductive transformation of source text to summary text ;through content reduction by selection and/or generation on what is important in ;source text [9]. Summarizing documents of all kinds of information is continually ;increasing and it is continued to be a steady subject of research over decades [1]. ;This process of automatic summarization deals with preprocessing documents; ;evaluating the importance of sentences; generating summaries; evaluating ;summarization; and so on. ;Multiple documents summarization produces summary from multiple documents ;instead of a single ones. It can be viewed as either as an extension of single ;document summarization of a collection of documents covering the same topic; or ;information extracted from several sources. Multi document summarization ;differs from single document summarization with the following ways: degree of ;redundancy; temporal dimension; compression ratio and co-reference problem ;[10]. ;A variety of multi-document summarization methods have been developed ;recently. Generally speaking; those methods can be either extractive ;summarization or abstractive summarization. Extractive summarization involves ;assigning saliency scores to some units (e.g. sentences; paragraphs) of the ;documents and extracting those with highest scores; while abstractive ;summarization usually needs information fusion; sentence compression and ;reformulation. Our work focuses on extractive summarization. ;The major challenge in multi-document summarization is that a document set may ;contain diverse information; which is either related or unrelated to the main ;central topic; and hence we need effective summarization methods to analyze and ;extract the important information. Additionally these information overlaps with ;each other; hence we need effective merging techniques to build summary. In ;order to present the summary readable and inter¨Crelated with other sentences; ;function of cohesion is studied [2]. Cohesion relates part of a text to another part ;of the same text. Consequently it lends continuity to the text by providing this ;kind of text continuity. It also enables the reader or listener to ensure continuity in ;reading the document. ;The above issues necessitate the need to investigate multi document ;summarization. In order that effective summaries are to be built from multi ;document clusters; there exist two different approaches. The first approach ;extracts sentences from multi document clusters; while the next approach is to ;merge sentences extracted by single document approach. Consider an example to ;illustrate the need or importance of the proposed investigations. If a cluster C1 has ;10 documents and each document having 10 sentences. If 10% compression ratio