Skip to content

Instantly share code, notes, and snippets.

@adricnet
Last active February 6, 2016 03:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save adricnet/d73c8615b50d7413c4bf to your computer and use it in GitHub Desktop.
Save adricnet/d73c8615b50d7413c4bf to your computer and use it in GitHub Desktop.
Encase output into something a bit more useful?

This week I had to deal with Encase tools output for some cases, and it's not likely to stop for awhile. I did some of this manually today and need to script that part, and then there's the actually hard part I need to think through.

The output I'm getting is the result of selecting some kinds of data and then exporting the results from Encase 7, rather than taking a PDF or RTF, which are much worse if you actually need to use the data for anything. It's tab separated columns of data, and is not an unfriendly format really but it does takes some work to get it into a state useful to do any analysis due to two things, which align with the two problems here:

  1. The way the data is laid out in the output text file
  2. The way the data is split into multiple pieces for each data type

Problem #1 is just a text data munging problem of the sort any scripter, sysadmin, or data analyst has probably already wrestled with and won against to some degree or another. The different data types are in columns seperated by tabs (like TSV) with useful notes and blank lines between the tables. The line before the column headers is a title , the the line of headers and then the data lines, which each have a line number. Most of the interesting data sets are in multiple tables, sometimes with more notes, blank lines, between them but there is always a header line followed by numbered data lines.Even if there's only one item this pattern seems to be kept.

So, we need to break up the output text file by blank lines, or maybe double blank lines, and look for the column headers and title lines to find where to tear it apart. This is pretty straighforward and I did it manually with copy and paste the first go today.

Problem #2 comes into play with these complex data sets that span multiple tables. It's as if the data base were normalized and the different tables are from breaking up the data into atoms, but we have line numbers for a join rather than explicit foreign key field references. There are candidate data types to use for desirable joins, and the complex data types lend themselves to that fairly well ... with the caveat that the full schema is a pretty sophisticated data model of the sort you might develop for an enterprise computer security tool. I really do try and avoid solo projects that require a cool entity-relationship diagram to explain.

It's equivalent to taking the output of your favourite ten volatility plugins and trying to link it all up, into SQLite, so that you can actually query across hosts, process trees, and times (say) ... which is to say it sounds easy enough but gets complex quickly. Folks have done that, btw, for awesome things like Manta Ray, Evolve, and of course Rekall, so check them out if you haven't :)

It's almost an afterthought to scaffold up a CRUD web ui on the SQLite once its assembled. Rails or Django would both be fine I'm sure for my meager needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment