期刊名称:International Journal of Electrical and Computer Engineering
电子版ISSN:2088-8708
出版年度:2021
卷号:11
期号:3
页码:2386
DOI:10.11591/ijece.v11i3.pp2386-2392
出版社:Institute of Advanced Engineering and Science (IAES)
摘要:In this paper, we propose a web document ranking method using topic modeling for effective information collection and classification. The proposed method is applied to the document ranking technique to avoid duplicated crawling when crawling at high speed. Through the proposed document ranking technique, it is feasible to remove redundant documents, classify the documents efficiently, and confirm that the crawler service is running. The proposed method enables rapid collection of many web documents; the user can search the web pages with constant data update efficiently. In addition, the efficiency of data retrieval can be improved because new information can be automatically classified and transmitted. By expanding the scope of the method to big data based web pages and improving it for application to various websites, it is expected that more effective information retrieval will be possible.