摘要:To be successful, cybercriminals must figure out how to scale their scams. They duplicate content on new websites, often staying one step ahead of defenders that shut down past schemes. For some scams, such as phishing and counterfeit goods shops, the duplicated content remains nearly identical. In others, such as advanced-fee fraud and online Ponzi schemes, the criminal must alter content so that it appears different in order to evade detection by victims and law enforcement. Nevertheless, similarities often remain, in terms of the website structure or content, since making truly unique copies does not scale well. In this paper, we present a novel optimized combined clustering method that links together replicated scam websites, even when the criminal has taken steps to hide connections. We present automated methods to extract key website features, including rendered text, HTML structure, file structure, and screenshots. We describe a process to automatically identify the best combination of such attributes to most accurately cluster similar websites together. To demonstrate the method’s applicability to cybercrime, we evaluate its performance against two collected datasets of scam websites: fake escrow services and high-yield investment programs (HYIPs). We show that our method more accurately groups similar websites together than those existing general-purpose consensus clustering methods.