HELK is an interesting platform to carry endpoint threat hunting and is useful both in a production situation as well as for research and training. For research and training purposes a key part is to add sample data to be able to practice hunting queries.
Yes this could probably be done in a better way but the goal here was K.I.S.S. and quick and dirty.
Splunk provides sample data from it's BOSS of the SOC CTF. Both v1 and v2 has been published as open source, more info here. The v1 data is available on github here unfortunately it is formatted for ingestion into Splunk.
The goal is to import into the HELK platform which is based on an ELK stack (elasticsearch, logstash and kibana). Thankfully, Sébastien Lehuédé has converted the data and done the work to ingest it into ELK. The process to covert the data and associated data and configuration files are published here under the label BOTES . Using this data, the challenge was to load into HELK with the following criteria:
- Minimal effort
- Not modify the core docker configuration of HELK
This gist provides the basic steps to ingest the data from
Before proceeding, prepare your environment and have the following deployed on your instance:
- Deployed HELK, preferably using the standard install script and docker-compose
- Have enough space to copy the datasets and load them into the system (data sizes are discussed here)
The process presented in this gist is very basic and will load the BOTES data as is and does not map it to the HELK format or transform it so that it gets correctly picked up in the HELK dashboards. This will be a future step.
The following process will ingest the BOTES data as is. The data will be ingested via a file load performed in Logstash. That means that the data will be copied into one of the existing docker image volumes configured in HELK.
In this ingest method, the decision was to place them in a directory called botes
in the helk-logstash subdirectory ./helk-logstash/enrichments/cti
.
The first step is to make sure your have everything ready and all the data loaded and prepare your environment.
- Download the data from https://botes.gitbook.io/botes-dataset/botes-elastic-bots-version, you will need the following dataset entries:
- winevent-application
- winevent-security
- winevent-system
- winregistry
- xmlwineventlog-sysmon
- Download the Elasticsearch Index Template from https://botes.gitbook.io/botes-dataset/botes-prerequisites, the file name is
template.json
- Download the Logstash configuration files from https://botes.gitbook.io/botes-dataset/botes-prerequisites, you will need the following conf files:
- input-winevent-application.conf
- input-winevent-security.conf
- input-winevent-system.conf
- input-winregistry.conf
- input-winevent-sysmon.conf
- output.conf
Any commands in the steps below will be based on the following assumptions:
- The botes files have been downloaded into the following path
~/botes/
- HELK directory is in
/opt/HELK
- Commands use aliases for the various docker images, e.g.
helk-logstash
, see note 2 below
NOTE: Replace any paths in the commands with the location used in your system
NOTE 2: If you have not set-up your /etc/hosts
replace the names with the IP address. The commands will use the format <helk-instancename>
in any commands referencing the instance network address. Replace the <helk-instancename>
with either a dns alias or the ip address. You can use docker network inspect docker_helk
to find the instance IP addresses. Example of a command:
curl -XPUT 'http://<helk-elasticsearch>:9200/_template/botes'
## would become the following by replacing `<helk-elasticsearch>` with it's IP Address
curl -XPUT 'http://172.18.0.10:9200/_template/botes'
First step is to load the index into the elasticsearch instance:
cd ~/botes/
curl -XPUT 'http://<helk-elasticsearch>:9200/_template/botes' \
-H 'Content-Type: application/json' \
-d@template.json
Before adding the configuration files into HELK, you need to shutdown the instances. Refer to HELK guidelines on how to do this.
As the intention is not to modify the core HELK configuration, the logstash configuration files assume the data is in /usr/share/logstash/cti/botes
which maps to the HELK repo subdirectory ./helk-logstash/enrichments/cti/botes
.
For each of the following files:
- input-winevent-application.conf
- input-winevent-security.conf
- input-winevent-system.conf
- input-winregistry.conf
- input-winevent-sysmon.conf
Edit the following section and change the element path
:
Original input-XXX.conf file
input {
file {
path => ["/botes/data/winevent/botesv1.XmlWinEventLog-Microsoft-Windows-Sysmon-Operational.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "json"
type => "WinEvent"
tags => ["winevent-sysmon"]
}
}
Changed input-XXX.conf file
input {
file {
path => ["/usr/share/logstash/cti/botes/botesv1.XmlWinEventLog-Microsoft-Windows-Sysmon-Operational.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "json"
type => "WinEvent"
tags => ["winevent-sysmon"]
}
}
Repeat these changes for each of the input-*.conf
files.
Edit the output configuration file changing the URL so that it points to the helk-elasticsearch
instance, the following changes need to be made:
Original output.conf file
output {
elasticsearch {
hosts => ["http://127.0.0.1:9200"]
index => "botes-glooper"
}
}
Changed output.conf file
output {
elasticsearch {
hosts => ["http://helk-elasticsearch:9200"]
index => "botes-glooper"
}
}
Next step is to copy the configuration files to the Logstash volumes.
cd ~/botes/
cp input-winevent-application.conf \
input-winevent-security.conf \
input-winevent-system.conf \
input-winregistry.conf \
input-winevent-sysmon.conf output.conf /opt/HELK/docker/helk-logstash/pipeline/
Next copy the data files into the HELK logstash volume.
cd ~/botes/
mkdir /opt/HELK/docker/helk-logstash/enrichments/cti/botes
gzip -d *.gz
cp *.json /opt/HELK/docker/helk-logstash/enrichments/cti/botes/
Refer to the HELK guidelines and restart your instance. You will need to wait a few minutes for the data to load.
Next step is to reference the index in Kibana UI. The easiest solution is to use the UI.
- Login into your HELK instance with a web browser
- Navigate to the Management tab
- Under Kibana select Index Patterns
- Click on Create Index Pattern
- In the Index pattern field type
botes-*
- This should highlight
botes-glooper
as a match - Click Next
- In the Time Filter field name select either
@timestamp
orevent.created
orevent.start
depending on your preference - Click on Create index button
You should now be able to query the data under the Discover tab. Note that depending on the timestamp used you may need to set your search data range window to 2016.
The next steps for this would be to make the data more compatible and inline with the HELK data model. These are following items that need to be worked on:
- Change the logstash configuration files to map the data to the same fields as HELK
- Write a script to modify the data timestamps
- Look at integrating BOTSv2 data
Hi there!
I appreciate this work. However, I faced a problem that I am unable to download the Elasticsearch Index Template.