文章基本信息

标题：Construction of an Automatic Bengali Text Summarizer Using Machine Learning Approaches
本地全文：下载
作者：Busrat Jahan ; Mahfuja Khatun ; Zinat Ara Zabu 等
期刊名称：Journal of Data Analysis and Information Processing
印刷版ISSN：2327-7211
电子版ISSN：2327-7203
出版年度：2022
卷号：10
期号：1
页码：43-57
DOI：10.4236/jdaip.2022.101003
语种：English
出版社：Scientific Research Publishing
摘要：In our study, we chose python as the programming platform for finding an Automatic Bengali Document Summarizer. English has sufficient tools to process and receive summarized records. However, there is no specifically applicable to Bengali since Bengali has a lot of ambiguity, it differs from English in terms of grammar. Afterward, this language holds an important place because this language is spoken by 26 core people all over the world. As a result, it has taken a new method to summarize Bengali documents. The proposed system has been designed by using the following stages: pre-processing the sample doc/input doc, word tagging, pronoun replacement, sentence ranking, as well as summary. Pronoun replacement has been used to reduce the incidence of swinging pronouns in the performance review. We ranked sentences based on sentence frequency, numerical figures, and pronoun replacement. Checking the similarity between two sentences in order to exclude one since it has less duplication. Hereby, we’ve taken 3000 data as input from newspaper and book documents and learned the words to be appropriate with syntax. In addition, to evaluate the performance of the designed summarizer, the design system looked at the different documents. According to the assessment method, the recall, precision, and F-score were 0.70, 0.82 and 0.74, respectively, representing 70%, 82% and 74% recall, precision, and F-score. It has been found that the proper pronoun replacement was 72%.
关键词：Natural Language ProcessingFormattingBangla Text SummarizerBengali Language ProcessingWord TaggingPronoun ReplacementSentence Ranking