期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2014
卷号:3
期号:2
页码:285
出版社:International Journal of Computer and Information Technology
摘要:Rapidly needing for indexing the contents of web pages to update the search engine’s indexes in the case of any potential modifications of its contents, represents a time and resource consuming operation which finally affects a new content or a content modification to be easily reflected in the search engines results. This paper proposes an approach called Multilateral Web Indexing Model (MWIM), which aims to describe a solution to the previously mentioned issues by establishing a more tied collaboration between websites and search engines. The collaboration consists in exposing to search engines auto generated metadata about the structure of a website and the contents of each web page. The Document Object Module (DOM) tree has used as an efficient tool for representing the web page content, while the XML Path language (Xpath) has used to provides a powerful syntax to address specific elements of DOM tree and extract metadata from HTML web page. Consequently, the search engine does not need any more to retrieve the web page mark-up code and perform the data extraction because this stage is performed directly on the website's platform when a web page was created or modified. Where every site had reliable metadata along of files, the work of a search engine would be made a lot easier and less time-consuming and resources and bandwidth, where the percentage of savings is 66.49%.