Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save deeso/885a910ef481db3544c0f05e925e3db8 to your computer and use it in GitHub Desktop.
Save deeso/885a910ef481db3544c0f05e925e3db8 to your computer and use it in GitHub Desktop.
writeup describing how to create a plaso parser and deploy it with timesketch

Soup to Nuts: Creating a Plaso Parser and Deploying Timesketch to Docker

Acknowledgements, etc.

Thank you to the Log2timeline and Timesketch teams for putting out some solid work. I am thankful that I have the opportunity to create this write up, which comes on the back of their hard work. While there may be holes in their documentation and descriptions, the code is well written and fairly easy to understand. Any criticism should not be interpretted as a reflection of the quality of their work.

Introduction to the Problem

Setting up Timesketch and Plaso seem pretty easy, but what happens when we need to create an instance dynamically with modifications of custom parsers. This document captures some of my experience and provides some insight into getting this up and running in a reasonable amount of time.

Scenario

A co-worker gave me an IP address and asked me to:

  1. Provide guidance on any additional information
  2. Investigate related hosts in the network

These are pretty standard tasks, so I fired up a VM and started visiting the IP address, which turned out to be pretty simple task. I repeatedly visited a site and got redirected through a sketchy ad-network.

Analysis Rig

To support the analysis, an Ubuntu Linux VM was created, and a passive DNS service to capture the domain resolutions.

  1. Ubuntu 18.04 in a VM
  2. PassiveDNS

Once the rig was set-up, the IP address was plugged in, and the site was visited and redirected to a number of different sites. This results in two desparate log and information sources (e.g. passive DNS, FireFox web history, and cookies).

Additional tools were needed to combine the different information sources.

Enter Plaso and Timesketch

Plaso is a tool that helps simplify and automate the extraction forensic information files, logs, etc. Plaso can handle analyzing known one-off log files or disk images and help build timelines based on the events and information recovered. If the log file is not known, a new parser and formatter will need to be written and added to the project. In this case, the passive DNS service is not supported, so additional development is required. However, the current documentation is lacking on how to write, test, and deploy the plugin, so this will be covered.

Timesketch is an application that helps with the analysis and visualization of timelines from Plaso or other sources of timeline data. The easiest way to get started with Timesketch is using Docker. The current documentation does not create a working instance of all the services.

Building a Plaso Parser and Formatter

In this example, a Plaso parser and formatter will be created. The parser and formatter will then be used to extract and export events from the log file using Timesketch, psort, etc.

Overview of Tasks for Creating Parsers and Formatters

  1. Create the parser and event Python classes.

    • event class contains fields that detail the event

      • set the DATA_TYPE (used to map event to a formatter)
      • create the important fields for the event (not timestamp because it causes aliasing with event data)
    • parser takes a file object, data, or parsed log line. Information is extracted and put into an event.

      • set the NAME (used to map the parser to a filter)
      • set the DESCRIPTION
      • Parsing different file/data types
        • parsing binary or unpredictable content formats deriving parsers from interface.FileObjectParser
        • parsing textual and predictable (e.g. logs, etc.) content formats deriving parsers from text_parser.PyparsingSingleLineTextParser, etc. (see parsers.text_parser for other types).
        • If creating a binary parser, derive class from interface.FileObjectParser:
          • put the parsing logic in parser.ParserFileObject, at run-time this method gets called
        • If creating a binary parser, derive class text_parser.PyparsingSingleLineTextParser:
          • put the parsing logic in parser.ParserRecord, at run-time this method gets called from FileObjectParser after reading a line
          • create the parsing grammar using pyparsing Python module
          • create a mapping of the name to grammar in LINE_STRUCTURES map the grammars to a key (see comments in text_parser.PyparsingSingleLineTextParser class)
  2. Create the formatter Python class

    • import the event created in the newly created parser module
    • set the DATA_TYPE (used to map event to a formatter)
    • create the important fields for the event (not timestamp because it causes aliasing with event data)
  3. Update the imports in parsers.__init__.py and formatters.__init__.py

  4. Testing the parser

    • Create the unittest test case in tests/parsers/ based on the new parser module
    • Create the test runner
      • copy/create a sample of the expected log data in test_data/
      • copy the run_tests.py --> new_run_tests.py
      • for an idea on how to create the test, look in tests\parsers and tests\formatters for a representative example
      • change the runner to focus only on tests\parsers directory and specific parser module
        • test_suite = unittest.TestLoader().discover('tests/parsers/', pattern='CHANGEME_TO_PARSER_NAME.py')
      • change the runner to focus only on tests\formatters directory and specific formatter module
        • test_suite = unittest.TestLoader().discover('tests/formatters/', pattern='CHANGEME_TO_FORMATTER_NAME.py')
  5. Testing Plaso

    • test the parser on the specific file content
      • execute log2timeline.py --parsers `PARSER_NAME` --status_view none test.plaso EXAMPLE.log
    • look for potential warnings or error and correct them in the code (see below for an example)
      • pinfo.py -v test.plaso
************************ Warnings generated per parser *************************
Parser (plugin) name : Number of warnings
--------------------------------------------------------------------------------
         <No parser> : 1
--------------------------------------------------------------------------------

************************* Pathspecs with most warnings *************************
Number of warnings : Pathspec
--------------------------------------------------------------------------------
                 1 : type: OS, location:
                     /home/you/EXAMPLE.log
--------------------------------------------------------------------------------

********************************** Warning: 0 **********************************
           Message : unable to process path specification with error: Expected
                     "||" (at char 82), (line:1, col:83)
      Parser chain : 
Path specification : type: OS, location:
                     /home/you/EXAMPLE.log
--------------------------------------------------------------------------------

Setting up and running Timesketch in Docker

Several modifications need to be made to the Timesketch to get it working and then also update the containers code with a fresh build of Plaso with the new code.

Required adminatrative changes to Timesketch docker set-up

  1. Dockerfile updates

    • create configuration file for the container
    • update code in the Timesketch distribution (see item 4)
    • update and install Plaso from source with the new parsers
  2. Added a few explict configurations to the docker-compose.yml

    • Set-up passwords neo4j and link it to timesketch
  3. Set host IPs and passwords in timesketch.conf manually to ensure services can connect. Where IPs are concerned, the address is set to the default docker interface (e.g. 172.17.0.1).

    • set neo4j IP and password
    • set sqlalchemy URI for postgres: IP, username, and password
    • set celery IP for URI
    • set elaticsearch IP
  4. Update tasks.py in Timesketch to properly handle Plaso uploads.

Side notes.

In the Dockerfile, the following lines were added to faciliate the install and update of Plaso:

#install util dependencies
RUN apt install -y sudo wget software-properties-common wget

# clone the plaso repo
RUN git clone https://github.com/log2timeline/plaso/ /tmp/plaso

# place the parser and formatter files into the right directory
ADD docker/parser_passivedns.py /tmp/plaso/plaso/parsers/passivedns.py
ADD docker/formatter_passivedns.py /tmp/plaso/plaso/formatters/passivedns.py

# place the parser and formatter files into the right directory
RUN echo -e "\nfrom plaso.parsers import passivedns\n" >> /tmp/plaso/plaso/parsers/__init__.py
RUN echo -e "\nfrom plaso.formatters import passivedns\n" >> /tmp/plaso/plaso/formatters/__init__.py

# install dependencies
RUN bash /tmp/plaso/config/linux/gift_ppa_install_py3.sh
RUN pip3 install -r /tmp/plaso/requirements.txt
RUN pip3 install -r /tmp/plaso/test_requirements.txt
RUN pip3 install /tmp/plaso

In the Dockerfile, the following lines were added to faciliate update of Timesketch:

# Copy the Timesketch configuration file into /etc
# Copy Timesketch config files into /etc/timesketch
RUN mkdir /etc/timesketch
RUN cp -r /tmp/timesketch/config/* /etc/timesketch/
ADD docker/tasks.py /usr/local/lib/python3.6/dist-packages/timesketch/lib/tasks.py
ADD docker/timesketch.conf /etc

Configuration and commands

Here are the commands that can be executed to install, update, and run Timesketch:

git clone https://github.com/google/timesketch/
cd timesketch/docker

# assuming wget is installed

# Download required files for timesketch
wget -O Dockerfile https://gist.githubusercontent.com/deeso/7418cb71f37fd9b7e37861de77dd9fb8/raw/dec8cbd34a9e876626b562bd2922b3b6dfb9de66/Dockerfile
wget -O docker-compose.yaml https://gist.githubusercontent.com/deeso/5e3e07b515ee6d0a43e91c0dfadf3bdf/raw/5f258ba6257e81890444b6282dfb95f89537dea8/docker-compose.yml
wget -O tasks.py https://gist.githubusercontent.com/deeso/0def71c215509c87e15ce7f22e7441f6/raw/6208919d3fb854e6f142c219375151a8b2ee503a/tasks.py
wget -O timesketch.conf https://gist.githubusercontent.com/deeso/6533d515693247664fa423d6c669e67b/raw/c3be406fdda8d5285466bfcc6c5f47885f44db89/timesketch.conf

# Download required files for the plaso update and install
wget -O parser_passivedns.py https://gist.githubusercontent.com/deeso/513dfad9fbbc9638d57129acd96762d6/raw/d7a5d3af5520eab145067ec98ec70085ee6baa0c/parser_passivedns.py
wget -O formatter_passivedns.py https://gist.githubusercontent.com/deeso/d0f8eb304d9c6771fa9a9b085d40a382/raw/e271ba5d873ccdc0089d9f78c1a70c13a24693c8/formatter_passivedns.py

# execute build and execute
docker-compose build; docker-compose up --detach

Reference Timesketch Files

Reference Plaso Files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment