期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2012
卷号:3
期号:1Ver 2
出版社:Ayushmaan Technologies
摘要:The internet web pages having lot of rising amount of content accessible, a useful method for users to determine the positive message from the needless information is immediately essential. A Cleanout web page for web data mining becomes important for civilizing presentation of information recovery and information removal. So, we study to remove different noise model in different web pages in its place of removing important message from web pages to receive major content message. In this research work, we propose concept to decide multiple noise model and remove noise data model from web pages of any web sites. The different web pages are first constructing DOM Tree. Our approach is based on finding noise in current web page and also web noise similarity by using basic Case-Based Reasoning (CBR) approach. We also apply a back transmission neural network algorithm to categorize the stored various noise model by corresponding noise data in current web page. We have implemented of our approach on many commercial web sites and news web sites to checking the presentation and development of our concept. This research paper show on the experimental result leads to more perfect and success of the approach.c eyes.
关键词:Noise Detection; Noise Elimination; DOM; Information Extraction; Noise Patterns; Information Retrieval; Case Based Reasoning; Noise; Neural Network; Data Patterns