The Influence of Feature Sets on the Detection of Advanced Persistent Threats

Publication from Digital, Policies
Datenanalyse und statistische Modellierung, Forschungsgruppe Cyber Security and Defence

Katharina Hofer-Schmitz, Branka Stojanovic, Ulrike Kleb

Electronics 2021, 10(6), 704, 3/2021


This paper investigates the influences of different statistical network traffic feature sets on detecting advanced persistent threats. The selection of suitable features for detecting targeted cyber attacks is crucial to achieving high performance and to address limited computational and storage costs. The evaluation was performed on a semi-synthetic dataset, which combined the CICIDS2017 dataset and the Contagio malware dataset. The CICIDS2017 dataset is a benchmark dataset in the intrusion detection field and the Contagio malware dataset contains real advanced persistent threat (APT) attack traces. Several different combinations of datasets were used to increase variety in background data and contribute to the quality of results. For the feature extraction, the CICflowmeter tool was used. For the selection of suitable features, a correlation analysis including an in-depth feature investigation by boxplots is provided. Based on that, several suitable features were allocated into different feature sets. The influences of these feature sets on the detection capabilities were investigated in detail with the local outlier factor method. The focus was especially on attacks detected with different feature sets and the influences of the background on the detection capabilities with respect to the local outlier factor method. Based on the results, we could determine a superior feature set, which detected most of the malicious flows.

Keywords: APT, local outlier detection, feature selection, statistical network traffic features