Skip to content

Instantly share code, notes, and snippets.

@sethhall
Created August 23, 2018 14:35
Show Gist options
  • Save sethhall/727ac36a630a642ca941661db68b87f4 to your computer and use it in GitHub Desktop.
Save sethhall/727ac36a630a642ca941661db68b87f4 to your computer and use it in GitHub Desktop.
Get some extra file names from http
redef record HTTP::Info += {
potential_fname: string &optional;
};
event http_request(c: connection, method: string, original_URI: string,
unescaped_URI: string, version: string) &priority=5
{
# Get rid of uri arguments
local path = split_string(c$http$uri, /\?/)[0];
local out = split_string(path, /\//);
# Take the last component in the uri path
c$http$potential_fname = out[|out|-1];
}
event http_header(c: connection, is_orig: bool, name: string, value: string) &priority=3
{
if ( is_orig )
return;
if ( name == "ETAG" && /\"/ in value )
{
if ( c$http?$potential_fname && c$http$potential_fname != "" )
c$http$current_entity$filename = c$http$potential_fname;
}
}
@duffy-ocraven
Copy link

I also think this discussion is still helping us each articulate and see some subtle but important things. Thanks for hanging in there, if I ever express myself too obtusely.

You identified something there at the last: "we have a disconnect on the way we think about the logs". I am relatively new to zeek (just a few months) and one of the top-of-mind thoughts I had was "I have got to code myself a customized LESS, that columnarizes and word-wraps when log viewing, at least as well as HTML tables and/or RTF does it." I think the human reader of logs is a vital audience, activating the penchant for pattern detection that the human brain is for-better-or-for-worse so prone to.

LESS has saved my bacon innumerable times. It can still let oneself get in a jam (for instance don't jump to end of file if the files ends with thousands of \x00, if you ever want to get your keyboard to respond again) but I had expected and I guess I am trying to here encourage zeek to regard "verbatim representation" as not as much of a boon to the zeek programmers and users, as a boon to the attackers. "unambiguous representation" I absolutely concur is something that the zeek programmers and users must have. But it can be expressed with a dialect that is constrained to only be comprised of benign characters.

@duffy-ocraven
Copy link

duffy-ocraven commented Sep 15, 2020

Oh and a small clarification, so that we don't digress over a canard. I realize Zeek logs aren't sequences of bytes where anything could end up in them, because the tab separated data and json both escape non-printable stuff. But internally in Zeek I worry if in every datatype they're all just arbitrary sequences of bytes which means they can technically haves nulls or anything else in them. I would blanche if hash results could haves nulls or anything such in them. The point I am raising in this discussion is that programmers carry some semantic baggage as they read variable and type names. I blanche if a "filename" can contain a * or / or \. It needs to be termed a filepath if it is the '/' delimited hierarchy. It needs to be a fullpath if it is the filepath and filename concatenated. It needs to be a pattern if it can contain * or ?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment