DARPA's OpTC dataset, released in the summer of 2020, stands to this day as the better dataset for supporting research in cyber threat detection from host-based telemetry. It consists of a 5 days of telemetry capture during which only normal activity from 1000 hosts generates events; followed by 3 days of such capture during which a subset of these hosts are subjected to cyber attacks. Its main drawback is that the normal activity is derived from a dumb script that forces Firefox instances to connect to websites from a fixed list, at random, overtop of Windows' own housekeeping processes; in addition, the attacks are executed without any attempt at concealing the activity. Threat detection on this dataset is thus a much easier problem than in real mission context, making the reliability and robustness of any detection scheme mere sanity checks. Regardless, the dataset remains useful if only for this purpose.
An import