期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2014
卷号:3
期号:10
页码:3461-3466
出版社:Shri Pannalal Research Institute of Technolgy
摘要:The goal of FoCUS is to crawl relevant forum content from the web with minimal overhead. The forums have different layouts and are powered by different forum software packages; they always have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. The web forum crawling problem is reduced to a URL-type recognition problem and classifies them as Index Page, Thread Page and Page-Flipping page. To address the scalab ility issue, the research proposes an edge-centric clustering scheme is to extract sparse social dimensions approach can efficiently handle n etworks of millions of actors while representing a comparable prediction performance to other non-scalable methods. In addition, the research includes a new concep t called sentiment analysis which transforms the cases into a standard model of features and classes. This is developed in two stages: emotional polarity computation and integrated sentiment analysis based on K -means clustering. The proposed unsupervised text-mining approach is used to group the forums into various clusters, with the center of each representing a hotspot forum within the current time span. As a result, the behavior of individuals is collected through their posts in a forum and then they are classified as positive/negative posts. The positive and negative value is assign to each word and to classify the word in the document.
关键词:Entry index thread path; Forum crawling; Page classification; Page type; URL pattern learning; ; URL type.