期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2014
卷号:11
期号:1
出版社:IJCSI Press
摘要:The process of downloading web pages is known as web crawling. In this paper we validate the architecture of Migrating parallel web crawler using finite state machine. The method for Migrating Parallel Web Crawling approach will detect changes in the content and structure. Also Domain specific crawling will yield high quality pages. The crawling process will migrate to host or server with specific domain and start downloading pages within specific domain. Incremental crawling will keep the pages in local database fresh thus increasing the quality of download-ed pages. The crawling strategy makes web crawling system more effective and efficient. Test cases are generated for the validation of the architecture. The approach for generating the test cases through FSM is very reliable and efficient and does not support for the invalid test cases. Valid input strings are generated as test cases.
关键词:Web crawling; parallel migrating web crawler; search engine; validation