期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2016
卷号:39
期号:2
页码:106
出版社:IEEE Computer Society
摘要:The quality of web sources has been traditionally evaluated using exogenous signals such as the hyper-link structure of the graph. We propose a new approach that relies on endogenous signals such as thecorrectness of factual information provided by the source: a source that has few false facts is consid-ered to be trustworthy. The facts are automatically extracted from each source by information extractionmethods commonly used to construct knowledge bases. We propose a way to distinguish errors made inthe extraction process from factual errors in the web source per se, by using joint inference in a novelmulti-layer probabilistic model. We call the trustworthiness score we computed Knowledge-Based Trust(KBT). We apply our method to a database of 2.8B facts extracted from the web, and thereby estimatethe trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effec-tiveness of the method.