Skip to content

Instantly share code, notes, and snippets.

@tvfischer
Last active July 26, 2023 04:11
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tvfischer/fdc4a1a05613279a5685dc4db4f83fe4 to your computer and use it in GitHub Desktop.
Save tvfischer/fdc4a1a05613279a5685dc4db4f83fe4 to your computer and use it in GitHub Desktop.
The goal is to document the steps taken to pull the Splunk BOTS endpoint hunting data into a HELK instance. The idea was to be able to add this data for simulation and training purposes.

Adding BOTSv1 Data to HELK

HELK is an interesting platform to carry endpoint threat hunting and is useful both in a production situation as well as for research and training. For research and training purposes a key part is to add sample data to be able to practice hunting queries.

Yes this could probably be done in a better way but the goal here was K.I.S.S. and quick and dirty.

Goal

Splunk provides sample data from it's BOSS of the SOC CTF. Both v1 and v2 has been published as open source, more info here. The v1 data is available on github here unfortunately it is formatted for ingestion into Splunk.

The goal is to import into the HELK platform which is based on an ELK stack (elasticsearch, logstash and kibana). Thankfully, Sébastien Lehuédé has converted the data and done the work to ingest it into ELK. The process to covert the data and associated data and configuration files are published here under the label BOTES . Using this data, the challenge was to load into HELK with the following criteria:

  • Minimal effort
  • Not modify the core docker configuration of HELK

This gist provides the basic steps to ingest the data from

Requirements

Before proceeding, prepare your environment and have the following deployed on your instance:

  • Deployed HELK, preferably using the standard install script and docker-compose
  • Have enough space to copy the datasets and load them into the system (data sizes are discussed here)

Current Limitations

The process presented in this gist is very basic and will load the BOTES data as is and does not map it to the HELK format or transform it so that it gets correctly picked up in the HELK dashboards. This will be a future step.

Process to Ingest BOTES Data

The following process will ingest the BOTES data as is. The data will be ingested via a file load performed in Logstash. That means that the data will be copied into one of the existing docker image volumes configured in HELK.

In this ingest method, the decision was to place them in a directory called botes in the helk-logstash subdirectory ./helk-logstash/enrichments/cti.

Prepare your Environment

The first step is to make sure your have everything ready and all the data loaded and prepare your environment.

  1. Download the data from https://botes.gitbook.io/botes-dataset/botes-elastic-bots-version, you will need the following dataset entries:
    • winevent-application
    • winevent-security
    • winevent-system
    • winregistry
    • xmlwineventlog-sysmon
  2. Download the Elasticsearch Index Template from https://botes.gitbook.io/botes-dataset/botes-prerequisites, the file name is template.json
  3. Download the Logstash configuration files from https://botes.gitbook.io/botes-dataset/botes-prerequisites, you will need the following conf files:
    • input-winevent-application.conf
    • input-winevent-security.conf
    • input-winevent-system.conf
    • input-winregistry.conf
    • input-winevent-sysmon.conf
    • output.conf

Any commands in the steps below will be based on the following assumptions:

  • The botes files have been downloaded into the following path ~/botes/
  • HELK directory is in /opt/HELK
  • Commands use aliases for the various docker images, e.g. helk-logstash, see note 2 below

NOTE: Replace any paths in the commands with the location used in your system

NOTE 2: If you have not set-up your /etc/hosts replace the names with the IP address. The commands will use the format <helk-instancename> in any commands referencing the instance network address. Replace the <helk-instancename> with either a dns alias or the ip address. You can use docker network inspect docker_helk to find the instance IP addresses. Example of a command:

curl -XPUT 'http://<helk-elasticsearch>:9200/_template/botes' 

    ## would become the following by replacing `<helk-elasticsearch>` with it's IP Address

curl -XPUT 'http://172.18.0.10:9200/_template/botes' 

Step 1: Configure HELK Elasticsearch

First step is to load the index into the elasticsearch instance:

cd ~/botes/
curl -XPUT 'http://<helk-elasticsearch>:9200/_template/botes' \
  -H 'Content-Type: application/json' \
  -d@template.json

Step 2: Shutdown your HELK instance

Before adding the configuration files into HELK, you need to shutdown the instances. Refer to HELK guidelines on how to do this.

Step 3: Prepare the Logstash Configuration

As the intention is not to modify the core HELK configuration, the logstash configuration files assume the data is in /usr/share/logstash/cti/botes which maps to the HELK repo subdirectory ./helk-logstash/enrichments/cti/botes.

First sub-step is to edit each INPUT configuration file

For each of the following files:

  • input-winevent-application.conf
  • input-winevent-security.conf
  • input-winevent-system.conf
  • input-winregistry.conf
  • input-winevent-sysmon.conf

Edit the following section and change the element path:

Original input-XXX.conf file

input {
	file {
		path => ["/botes/data/winevent/botesv1.XmlWinEventLog-Microsoft-Windows-Sysmon-Operational.json"]
		start_position => "beginning"
		sincedb_path => "/dev/null"
		codec => "json"
		type => "WinEvent"
		tags => ["winevent-sysmon"]
	}
}

Changed input-XXX.conf file

input {
	file {
		path => ["/usr/share/logstash/cti/botes/botesv1.XmlWinEventLog-Microsoft-Windows-Sysmon-Operational.json"]
		start_position => "beginning"
		sincedb_path => "/dev/null"
		codec => "json"
		type => "WinEvent"
		tags => ["winevent-sysmon"]
	}
}

Repeat these changes for each of the input-*.conf files.

Second sub-step is to edit the output configuration file

Edit the output configuration file changing the URL so that it points to the helk-elasticsearch instance, the following changes need to be made:

Original output.conf file

output {
       	elasticsearch {
               	hosts => ["http://127.0.0.1:9200"]
		index => "botes-glooper"
       	}
}

Changed output.conf file

output {
       	elasticsearch {
               	hosts => ["http://helk-elasticsearch:9200"]
		index => "botes-glooper"
       	}
}

Add the configuration files to the Logstash server

Next step is to copy the configuration files to the Logstash volumes.

cd ~/botes/
cp input-winevent-application.conf \
  input-winevent-security.conf \
  input-winevent-system.conf \
  input-winregistry.conf \
  input-winevent-sysmon.conf output.conf /opt/HELK/docker/helk-logstash/pipeline/

Step 4: Add the Data Files

Next copy the data files into the HELK logstash volume.

cd ~/botes/
mkdir /opt/HELK/docker/helk-logstash/enrichments/cti/botes
gzip -d *.gz
cp *.json /opt/HELK/docker/helk-logstash/enrichments/cti/botes/

Step 5: Restart HELK instance

Refer to the HELK guidelines and restart your instance. You will need to wait a few minutes for the data to load.

Step 6: Create an index in Kibana

Next step is to reference the index in Kibana UI. The easiest solution is to use the UI.

  1. Login into your HELK instance with a web browser
  2. Navigate to the Management tab
  3. Under Kibana select Index Patterns
  4. Click on Create Index Pattern
  5. In the Index pattern field type botes-*
  6. This should highlight botes-glooper as a match
  7. Click Next
  8. In the Time Filter field name select either @timestamp or event.created or event.start depending on your preference
  9. Click on Create index button

You should now be able to query the data under the Discover tab. Note that depending on the timestamp used you may need to set your search data range window to 2016.

Next Steps

The next steps for this would be to make the data more compatible and inline with the HELK data model. These are following items that need to be worked on:

  • Change the logstash configuration files to map the data to the same fields as HELK
  • Write a script to modify the data timestamps
  • Look at integrating BOTSv2 data
@4ss3mbl3rV
Copy link

Hi there!
I appreciate this work. However, I faced a problem that I am unable to download the Elasticsearch Index Template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment