期刊名称:International Journal of Electronics Communication and Computer Engineering
印刷版ISSN:2249-071X
电子版ISSN:2278-4209
出版年度:2013
卷号:4
期号:2
页码:427-431
出版社:IJECCE
摘要:The traditional search engines available over the internet are dynamic in searching the relevant content over the web. The search engine has got some constraints like getting the data asked from a varied source, where the data relevancy is exceptional. The web crawlers are designed only to more towards a specific path of the web and are restricted in moving towards a different path as they are secured or at times restricted due to the apprehension of threats. It is possible to design a web crawler that will have the capability of penetrating through the paths of the web, not reachable by the traditional web crawlers, in order to get a better solution in terms of data, time and relevancy for the given search query. The paper makes use of a newer parser and indexer for coming out with a novel idea of web crawler and a framework to support it. The proposed web crawler is designed to attend Hyper Text Transfer Protocol Secure (HTTPS) based websites and web pages that needs authentication to view and index. User has to fill a search form and his/her creditionals will be used by the web crawler to attend secure web server for authentication. Once it is indexed the secure web server will be inside the web crawler’s accessible zone
关键词:Deep Web Crawler; Hidden Pages; Accessing Secured Databases; Indexing