期刊名称:International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN:1975-0080
出版年度:2014
卷号:9
期号:12
页码:405-420
DOI:10.14257/ijmue.2014.9.12.35
出版社:SERSC
摘要:Web page filtering technology intends to filter out the large number of the repeated and theme-unrelated noise information and obtain useful information. Some web filtering methods cannot make full use of the layout and visual features. In view of the new mainstream "DIV+CSS" designing style of modern commercial web sites, this paper summarizes that elements laying in the same div blocks have common semantic features and proposed a DIV_FOREST model to represent the web pages. And in combination with the Vision-based Page Segmentation Algorithm, a DVPS Algorithm which considers both layout features and visual features was proposed to improve web page filtering efficiency.
关键词:Web Page Data Filtering; Web Page Segmentation; DIV_FOREST Model; ; DVPS Algorithm