期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:3
页码:5410
DOI:10.15680/IJIRCCE.2017.0503323
出版社:S&S Publications
摘要:The purpose of this project is to crawl a website to find out the status of each of the links contained in itspages. Each website hosted on the Internet contains several links and each of these links reference a certain web page.Over the time, these web pages are either removed or are no longer available due to several reasons, but the linksreferencing them still exist. This causes those links to be obsolete and it becomes the responsibility of the websitedevelopers to periodically check all the links in all the web pages in their website. Moreover, having obsolete orwrongly referenced links also harms the SEO ranking of the website in various Search Engines and in certain cases leadto a loss in revenue. This paper also discusses the necessary features a successful web crawler must possess.