Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Get some extra file names from http
redef record HTTP::Info += {
potential_fname: string &optional;
};
event http_request(c: connection, method: string, original_URI: string,
unescaped_URI: string, version: string) &priority=5
{
# Get rid of uri arguments
local path = split_string(c$http$uri, /\?/)[0];
local out = split_string(path, /\//);
# Take the last component in the uri path
c$http$potential_fname = out[|out|-1];
}
event http_header(c: connection, is_orig: bool, name: string, value: string) &priority=3
{
if ( is_orig )
return;
if ( name == "ETAG" && /\"/ in value )
{
if ( c$http?$potential_fname && c$http$potential_fname != "" )
c$http$current_entity$filename = c$http$potential_fname;
}
}
@duffy-ocraven
Copy link

duffy-ocraven commented Sep 15, 2020

This thread has taken on a life of its own, and really--I only started out thinking "I'll suggest some validation" and then "hey, the code can also as a convenience do the benign instances of unescaping while it validates", which engendered the first post. I am glad to have had the dive-deeper into what distinguishes an id vs a uri vs a filename in the zeek realms. I came into this discussion with a profound respect for the uri escape specification in https://tools.ietf.org/html/rfc3986. I think they correctly captured: pay with human-reader-inconvenience and do substitution of all potentially dangerous characters (a surprisingly fair-price since those usually are not-relevant to legit use-cases). https://tools.ietf.org/html/rfc8089 for filenames specifically builds upon that, and I think we will avoid trouble if we stay close to those principles.

@sethhall
Copy link
Author

sethhall commented Sep 15, 2020

I've been enjoying this discussion quite a bit!

The 8089 RFC actually points out...

Treating a non-local file URI as local, or otherwise attempting to
   perform local operations on a non-local URI, can result in security
   problems.

I would take it to mean that the 8089 RFC is actually not terribly relevant to anything we're doing with Zeek because basically everything we have comes from the network and would be classified as non-local.

My general perspective on Zeek logs is that they aren't necessarily "words" anyway. Internally in Zeek they're all just arbitrary sequences of bytes which means they can technically haves nulls or anything else in them and the moment we start to try protect something like the filename field as safe to use for arbitrary purposes is the day that we start getting ourselves into trouble. Basically none of the data coming out of Zeek is safe to do anything with if a user decodes escaped fields (since the tab separated data and json both escape non-printable stuff) they have to adopt the notion that any data could end up in any field.

Have you viewed Zeek logs as sequences of bytes where anything could end up in them or had you taken a different perspective on the logs? I'm sort of wondering if we just have a disconnect on the way we think about the logs.

@duffy-ocraven
Copy link

duffy-ocraven commented Sep 15, 2020

I also think this discussion is still helping us each articulate and see some subtle but important things. Thanks for hanging in there, if I ever express myself too obtusely.

You identified something there at the last: "we have a disconnect on the way we think about the logs". I am relatively new to zeek (just a few months) and one of the top-of-mind thoughts I had was "I have got to code myself a customized LESS, that columnarizes and word-wraps when log viewing, at least as well as HTML tables and/or RTF does it." I think the human reader of logs is a vital audience, activating the penchant for pattern detection that the human brain is for-better-or-for-worse so prone to.

LESS has saved my bacon innumerable times. It can still let oneself get in a jam (for instance don't jump to end of file if the files ends with thousands of \x00, if you ever want to get your keyboard to respond again) but I had expected and I guess I am trying to here encourage zeek to regard "verbatim representation" as not as much of a boon to the zeek programmers and users, as a boon to the attackers. "unambiguous representation" I absolutely concur is something that the zeek programmers and users must have. But it can be expressed with a dialect that is constrained to only be comprised of benign characters.

@duffy-ocraven
Copy link

duffy-ocraven commented Sep 15, 2020

Oh and a small clarification, so that we don't digress over a canard. I realize Zeek logs aren't sequences of bytes where anything could end up in them, because the tab separated data and json both escape non-printable stuff. But internally in Zeek I worry if in every datatype they're all just arbitrary sequences of bytes which means they can technically haves nulls or anything else in them. I would blanche if hash results could haves nulls or anything such in them. The point I am raising in this discussion is that programmers carry some semantic baggage as they read variable and type names. I blanche if a "filename" can contain a * or / or \. It needs to be termed a filepath if it is the '/' delimited hierarchy. It needs to be a fullpath if it is the filepath and filename concatenated. It needs to be a pattern if it can contain * or ?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment