文章基本信息

标题：A Framework for Generating Extractive Summary from Multiple Malayalam Documents
本地全文：下载
作者：K. Manju ; S. David Peter ; Sumam Mary Idicula 等
期刊名称：Information
电子版ISSN：2078-2489
出版年度：2021
卷号：12
期号：1
页码：41
DOI：10.3390/info12010041
出版社：MDPI Publishing
摘要：Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.
关键词：Malayalam language; extractive mutidocument summarization; NLP; sentence encoding; TextRank; maximum marginal relevance Malayalam language ; extractive mutidocument summarization ; NLP ; sentence encoding ; TextRank ; maximum marginal relevance