APT datasets and attack modeling for automated detection methods: A review.

Publication from Digital

B. Stojanovic, K. Hofer-Schmitz, U. Kleb

Computers & Security, Volumne 92, 2020 , 1/2020


Automated detection methods for targeted cyber attacks are getting more and more prominent. In order to test these methods properly, it is crucial to have a suitable dataset. This paper provides a review on datasets and their creation for use in APT detection in literature. A special focus is placed on feature engineering, including construction, selection and dimensionality reduction. Two use cases based on the underlying infrastructure are distinguished, large enterprise networks and Cyber Physical System, additionally including cloud computing systems, financial technology networks and Internet of Things networks. These datasets are usually based on an attack model. A description of different stages including approaches and goals of such attacks are given. The major achievement is the description and analysis of existing feature extraction methodologies and detailed overview of datasets used in APT detection related literature. This shows that the large enterprise network use case, has incorporated a much more frequent use of datasets with quite short periods of time. In the case of Cyber Physical System, a realistic dataset is publicly available.