首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:An Algorithm for Effective Web Crawling Mechanism of a Search Engine
  • 作者:B. Vijaya Babu ; M. Surendra Prasad Babu ; Y. Chetan Prasad
  • 期刊名称:Oriental Journal of Computer Science and Technology
  • 印刷版ISSN:0974-6471
  • 出版年度:2008
  • 卷号:1
  • 期号:1
  • 页码:49-54
  • 语种:English
  • 出版社:Oriental Scientific Publishing Company
  • 摘要:Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. In addition, I/O performance, network resources, and OS limits must be taken into account in order to achieve high performance at a reasonable cost. Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of web pages reachable purely by following hypertext links, ignoring search forms and pages that require authorization or prior registration. In particular, they ignore the tremendous amount of high quality content “hidden” behind search forms, in large searchable electronic databases. Also even if there is good data collection that has been indexed we would be able to look at those sites having these info only if we are connected to the internet and may be the days where hourly based nets used to be major providers of internet are gone, these days the broadband facilities, high speed net connections are available to the common man. But, the growth in the usage of laptops is even growing at same pace, in that case one may not be able to access the net where ever he moves and if any important pages on the net in a particular web site would be of no use even he has good configuration, as still it takes time for the wi-fi networks to come in to full swing, until then saving every page of a particular website may be a hectic task. In this paper, would provide a framework for addressing the problem of browsing the web even when offline.
  • 关键词:Web crawler ; search engine ; indexer ; frontier ; crawl manager
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有