期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2019
卷号:97
期号:23
页码:3475-3485
出版社:Journal of Theoretical and Applied
摘要:Nowadays, text summarization has become an important issue to extract the required information within short time. Several techniques on extractive text summarization have been developed for summarizing English text(s). However, there is a few works done for the summarization of Bengali text(s). In this paper, an improved extractive Bengali text summarization technique has been proposed with enhancing the word scoring process, position value heuristics and summary generation procedure of our previously presented summarizer. In the word scoring procedure, each word is preprocessed using noise removal, tokenization, stop word removal and stemming operation. Then, a heuristics is applied to calculate the word score through checking it in all the input document(s). Moreover, a modified heuristic is proposed for the sentence scoring in which it has given the priority highest to the middle sentence and then the upper and lower sentences from the middle sentence will be less prioritized. Finally, top k-sentences are extracted from each of the clusters of sentences made by K-means clustering algorithm and then the extracted sentences are sorted as their actual appearances in the original document(s). Thus, the final summary is synchronized with the original document(s). In comparison to the existing method, the experimental result shows that the proposed improved technique produces better summarization to satisfy the end-users.
关键词:Text Summarization; Extractive Summarization; Bengali Text Summarization; Heuristics; Synchronized Summary