Skip to content

Instantly share code, notes, and snippets.

@maybe-hello-world
Created June 22, 2023 21:28
Show Gist options
  • Save maybe-hello-world/3f19f82f42d3b70140475d1a4dd21a0d to your computer and use it in GitHub Desktop.
Save maybe-hello-world/3f19f82f42d3b70140475d1a4dd21a0d to your computer and use it in GitHub Desktop.
Getting NetML CIC-IDS-2017 subset datasets and splitting it into windows
FROM ubuntu:latest
RUN apt update
RUN apt install -y g++ make wget curl libpcap-dev python3-pip parallel git
# pcapml
RUN wget "https://github.com/nprint/pcapml/releases/download/v0.3.1/pcapml-0.3.1.tar.gz" -O pcapml.tar.gz
RUN tar -xvf pcapml.tar.gz
RUN cd pcapml-0.3.1 && ./configure && make && make install
# to download dataset from google drive
RUN pip3 install gdown
# install rust
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
# add rust to PATH
ENV PATH="/root/.cargo/bin:${PATH}"
# pcap window splitter
RUN git clone https://github.com/maybe-hello-world/pcap_window_splitter.git
RUN cd pcap_window_splitter && cargo build --release
RUN cp pcap_window_splitter/target/release/pcap_window_splitter /usr/local/bin/
# download NetML (CIC-IDS-2017)
# from Jordan Holland's datasets: https://drive.google.com/drive/folders/15Axxx-5d4HLHjPJb9dudyPQKGfRaoxQz
gdown 1RTRIGbVlZtA8Zaj-aAAvr1wMv12BoKg1
# unzip
gzip -d traffic.pcapng.gz
# split to flows
RUN pcapml -M traffic.pcapng -O /data/
# split flows to windows of 0.1 seconds
mkdir /data_windowed
find /data -maxdepth 1 -name '*.pcap' -print0 | parallel -0 pcap_window_splitter {} /data_windowed 0.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment