摘要:Extractive text summarization consists of selecting the most important sentences from the original text. By summarizing the contents of the article, readers might be able to understand the article more easily and faster than reading the entire article. The process of summarizing involves gathering as much as possible of the information and presenting only the most important details as succinctly as possible. To solve that problem, a genetic algorithm will be adopted to extract sentences as a summary. The summarization process is considered as an optimization problem where the optimal summary is selected from a series of sentences from the original document. Genetic algorithm used to optimize sentence selection to obtain a summary that represents the main idea of the source document where the compression rate determines the number of sentences selected as summary. To represents the text and capture the interconnects between sentences, a graph will be constructed and given a weight with PageRank score. 60 news articles in Bahasa Indonesia from IndoSum are used as a dataset. To evaluate how good the results are, ROUGE-1 and cosine similarity are calculated to compare the summary generated by the system and reference summary. This study also set up 5 comparisons to other methods such as SumBasic, LexRank, LSA, TextRank, and KLSum. Evaluation results yield better summary results compare to other methods with the average ROUGE-1 score 0.641 on recall and cosine similarity 0.625 for compression rate of 30%.
关键词:automatic text summarization;extractive summarization;genetic algorithm;news article