期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2014
卷号:60
期号:2
出版社:Journal of Theoretical and Applied
摘要:Internet is a rapid growing technology that contains a vast and rich set of information stored on the web. To retrieve, share and to process all these information from the web, a tool has been created called search engine which plays an essential role for the web users. On searching the information from web servers through search engines, many irrelevant and redundant documents containing the required information will be retrieved and presented to the users. But this irrelevant and replicated information affects the performance of the search engine by wasting the user�s time by surfing the uninterested documents which is inefficient to the web users. So, to make the search effective, and to improve the performance of the search, many researchers turned their attention towards Web mining since web is used in almost all areas. Web content mining is a sub area under web mining that mines required and useful knowledge or information from the web content. Most existing algorithms focus on applying weight age only to the common terms in the documents by which the accuracy gets consecutively reduced. The performance of a search engine can be improved through this proposed approach based on term frequency ranking to mine the web contents.
关键词:Correlation Coefficient; Search Engines; Term Frequency; Web Content Mining; Web Content Outliers.