Skip to content

Instantly share code, notes, and snippets.

@hamelin
hamelin / modern-unix.yml
Created October 4, 2023 15:13
A Conda environment file that pulls in most tools described at https://github.com/ibraheemdev/modern-unix
name: modern-unix
channels:
- conda-forge
- dnachun
dependencies:
- bat
- bottom
- broot
- cheat
- choose-rust
@hamelin
hamelin / PytestEmulation.ipynb
Created August 14, 2023 18:54
Pytest emulation directly in a notebook
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelin
hamelin / Example Jupyter Proxy Server.ipynb
Last active April 12, 2023 16:00
Example of usage of Jupyter Proxy Server to host local content for an in-notebook GUI.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Labels for OpTC dataset host-based events

DARPA's OpTC dataset, released in the summer of 2020, stands to this day as the better dataset for supporting research in cyber threat detection from host-based telemetry. It consists of a 5 days of telemetry capture during which only normal activity from 1000 hosts generates events; followed by 3 days of such capture during which a subset of these hosts are subjected to cyber attacks. Its main drawback is that the normal activity is derived from a dumb script that forces Firefox instances to connect to websites from a fixed list, at random, overtop of Windows' own housekeeping processes; in addition, the attacks are executed without any attempt at concealing the activity. Threat detection on this dataset is thus a much easier problem than in real mission context, making the reliability and robustness of any detection scheme mere sanity checks. Regardless, the dataset remains useful if only for this purpose.

An import

@hamelin
hamelin / README.md
Created February 7, 2023 16:39
OpTC dataset labels

Labels for OpTC dataset host-based events

DARPA's OpTC dataset, released in the summer of 2020, stands to this day as the better dataset for supporting research in cyber threat detection from host-based telemetry. It consists of a 5 days of telemetry capture during which only normal activity from 1000 hosts generates events; followed by 3 days of such capture during which a subset of these hosts are subjected to cyber attacks. Its main drawback is that the normal activity is derived from a dumb script that forces Firefox instances to connect to websites from a fixed list, at random, overtop of Windows' own housekeeping processes; in addition, the attacks are executed without any attempt at concealing the activity. Threat detection on this dataset is thus a much easier problem than in real mission context, making the reliability and robustness of any detection scheme mere sanity checks. Regardless, the dataset remains useful if only for this purpose.

An import

@hamelin
hamelin / README.md
Created February 7, 2023 16:38
OpTC dataset labels

Labels for OpTC dataset host-based events

DARPA's OpTC dataset, released in the summer of 2020, stands to this day as the better dataset for supporting research in cyber threat detection from host-based telemetry. It consists of a 5 days of telemetry capture during which only normal activity from 1000 hosts generates events; followed by 3 days of such capture during which a subset of these hosts are subjected to cyber attacks. Its main drawback is that the normal activity is derived from a dumb script that forces Firefox instances to connect to websites from a fixed list, at random, overtop of Windows' own housekeeping processes; in addition, the attacks are executed without any attempt at concealing the activity. Threat detection on this dataset is thus a much easier problem than in real mission context, making the reliability and robustness of any detection scheme mere sanity checks. Regardless, the dataset remains useful if only for this purpose.

An import

@hamelin
hamelin / accessing-data-from-azure-blob-containers.ipynb
Last active April 26, 2022 21:13
Tutorial on using Rclone and fsspec to wrangle data stored on Azure blob containers.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelin
hamelin / Building sparse matrices in Python.ipynb
Last active December 8, 2021 13:44
Explains how to build sparse (COO) matrices that one-hot encode square lists
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelin
hamelin / minimal-transport.ipynb
Last active October 27, 2021 14:52
A discussion of optimization with linear equality constrains applied to minimal transport problems
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelin
hamelin / jupyter-lab-config.py
Last active December 23, 2021 16:45
Configuration to get Jupyter Lab's editor to trim whitespace at the end of each line when it saves a file.
# 1. Run: jupyter lab --generate-config
# 2. Edit $PATH/.jupyter/jupyter-lab-config.py, add the following at the end.
#
# In Jupyterhub, one needs to restart their "server" to get such config changes to take.
def strip_ws(t):
return '\n'.join([i.rstrip() for i in t.split('\n')])
def scrub_output_pre_save(model=None, **kwargs):