期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2013
卷号:4
期号:10-2
出版社:Seventh Sense Research Group
摘要:Big data is the serious problem for data handlers the large amounts of data are created in day to day life. Online Social Networks (OSN’s) are one of the reasons for big data. Where anyone fires a query for they get a result from one particular database and it should be limited one. But if data come from multiple web databases, then it contains more results as compared to single database. The advantage of using multiple web databases is that it gets more relevant data. To address the problem of record matching in the Web database scenario, it present an unendorsed, online record matching method, UDD, which, for a given query, can efficiently identify duplicates from the query result records of multiple Web databases. After elimination of the samesource duplicates, the “presumed” non duplicate records from the same source can be used as training examples alleviating the burden of users having to manually label it. Starting from the non duplicate set, it use two cooperating classifiers, a weighted element similarity summing classifier and an SVM classifier, to iteratively recognize duplicates in the query results from multiple Web databases. Experimental results demonstrate that UDD works well for the Web database scenario where existing supervised methods do not apply. For this it used two databases Google and Faroo. With the initiation of information technology, a user is able to access relevant information from the World Wide Web, which contains a vast amount of information, simply and quickly by inflowing search queries. In response to information and deliver it directly to the user.
关键词:Big data; Web Databases; Query; Online Social Networks