期刊名称:International Journal of Computer Science Issues
印刷版ISSN:1694-0784
电子版ISSN:1694-0814
出版年度:2012
卷号:9
期号:1
出版社:IJCSI Press
摘要:The Mercator describes, as a scalable, extensible web crawler written entirely in Java. In term of Scalable, web crawlers must be scalable and it is important component of many web services, but their design is not well-documented in the literature. In this paper, we enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercators support for extensibility and customizability. Finally, we comment on Mercators performance, which we have found to be more efficient and comparable to that of other crawlers.
关键词:Introduction; Related Work; Architecture; Components; Extensibility; Conclusions.