Skip to content

Instantly share code, notes, and snippets.

@teichopsia teichopsia/regex-trap
Last active Aug 29, 2015

Embed
What would you like to do?
[interview traps] how to properly scan a line of input for an ip address
if you have an interviewer ask this question and they want a regex. get up and walk out the door. These goonies probably still use tn-3270, cobol and ftp (cutting edge). they also have no idea what regexes are for; how to anchor them; nor how not to expose excessive backreferencing, backtracking and other nfa promotions. they also probably have production code you can regex bomb or worse lfi.
at best the regex is used to capture the result that gets passed to a primitive designed to detaint/parse. go read up on the GHOST attacks if you think these are easy to write securely.
the only correct answer to anybody using a dynamic language is to invoke your libc binding for standard posixish primitives. bypassing these basically means you're ignoring how getaddrinfo modern resolution primitives work on dual stacks and generally taking a piss on ipv6. welcome to the 90s kid ipv6 is real. you're decades behind already in your regex. I dare you to write a correct folding implementation of ipv6 in a regex. Welcome back. Not easy eh?
the correct answer is stop inventing a wheel when you have no idea how cars or roads work. have respect for what came before you. it's nice to learn how things work but consider the repercussions of your "new" implementation.
the entire point is that you have primitives that you can directly pass to connect (pass data) or getpeername (log the connection which nobody seems to do these days either)
#!/usr/bin/perl
use 5.014;
use strict;
use warnings qw(all);
use Socket qw(:addrinfo);
my $line=undef;
while($line=<DATA>) {
foreach my $try (($line=~/\s+(\S+)\s+/)) { # see? anchored. could be a split+index and be O(n) and less cryptic too
my($t,@r)=getaddrinfo($try,NIx_NOSERV,{flags=>AI_NUMERICHOST|AI_CANONNAME});
if(!$t) {
foreach my $answer (@r) {
my($b,$addr)=getnameinfo($answer->{addr},NI_NUMERICHOST,NIx_NOSERV);
my($c,$host)=getnameinfo($answer->{addr},0,NIx_NOSERV);
print "$host $addr\n" if !$b and !$c;
}
}
}
}
__END__
testipv6 ::1 trailer
testipv4 127.0.0.1 trailer
@teichopsia

This comment has been minimized.

Copy link
Owner Author

commented May 25, 2015

compounding this i still find vendors writing grok filters with hardset hostname regexes such as:

HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(.?|\b)

which assumes IDNA hostnames are punycoded which I could agree with assuming however... what if my locale isn't C_ALL? oops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.