Download Benchmark

In recent years, the complexity of web-based application traffic has increased dramatically, introducing new vulnerabilities and motivating the development of new security solutions. Despite this, few resources exist to evaluate the performance of web security solutions on complex application traffic. In this work, we present a dataset for benchmarking application-layer intrusion detection systems.

The labelled dataset contains 892,833 sanitized HTTP requests derived from the logs of a popular open-source application – “Domination”, and addresses several of the issues of previous web benchmarks:

a lack of complex payloads
unrealistically balanced training sets
a lack of multi-stage attacks
bot traffic

In addition to providing this dataset, we also introduce two tasks: namely, a one-stage attack detection task, and a multi-stage attack detection task.

We present experimental results for several common application security approaches on these tasks. On the single-stage task, the linear SVM classifier trained on augmented TF-IDF embeddings had the highest performance, with an F1 score of 0.921. On the multi-stage task, the Random Forest classifier trained on the final hidden state of an TF-IDF + LSTM model had the highest performance, with an F1 score of 0.721.

To our knowledge, this is the first work to introduce a broad API security benchmark developed from modern application HTTP traffic.