期刊名称:International Journal of Computer Trends and Technology
出版社:Seventh Sense Research Group
摘要:In web database contains a large amount of information that is in the form of structured objects which are called as data records. In web databases to automatically extracting data records that are encoded in the query result page. These data records are important because these are present the essential information of their host pages, e.g., lists of products or services. A query result page contains not only the actual data, but also other information, such as navigational panels, advertisements, comments, information about hosting sites. The goal of web database data extraction is to remove any irrelevant information from the query result page, extract the query result records from those page, and align the extracted query result record (QRR) from the page, and align the extracted query result records into a table such that data values belonging to the same attribute are placed into the same table column. The proposed technique is able to handle both the attribute based and content based values are retrieved from the web pages in structured and unstructured data.
关键词:Web data records; data region identification; record alignment; wrapper; information integration