出版社:Information and Media Technologies Editorial Board
摘要:Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors are unaware their profiles are being utilized. In order to prevent such sites in a corporate network, most companies employ filters that rely on blacklists, but these lists are insufficient. In this paper, we propose Web tracking sites detection and blacklist generation based on temporal link analysis. Our proposal analyzes traffic at the network gateway so that it can monitor all tracking sites in the administrative network. The proposed algorithm constructs a graph between sites and their visited time in order to characterize each site. Then, the system classifies suspicious sites using machine learning. We confirm that public black lists contain at most 22-70% of the known tracking sites respectively. The machine learning can identify the blacklisted sites with true positive rate, 62-73%, which is more accurate than any single blacklist. Although the learning algorithm falsely identified 15% of unlisted sites, 96% of these are verified to be unknown tracking sites by means of a manual labeling. These unknown tracking sites can serve as good candidates for an entry of a new backlist.