摘要:Botnets are collections of connected, malware-infected hosts that can be controlled by a remote
attacker. They are one of the most prominent threats in cybersecurity, as they can be used for a
wide variety of purposes including denial-of-service attacks, spam or bitcoin mining. We propose a
two-stage, machine-learning based method for distinguishing between botnet and non-botnet
network traffic, with the aim of reducing false positives by examining both network-centric and
host-centric traffic characteristics. In the first stage, we examine network flow records generated
over limited time intervals, which provide a concise but partial summary of the complete network
traffic profile, and use supervised learning to classify flows as malicious or benign based on a set
of extracted statistical features. In the second stage, we perform unsupervised clustering on
internal hosts involved in previously identified malicious communications to determine which
hosts are most likely to be botnet-infected. Using existing datasets, we demonstrate the feasibility
of our method and implement a proof-of-concept, real-time detection system that aggregates the
results of multiple classifiers to identify infected hosts.