Skip to content

Instantly share code, notes, and snippets.

@grepory
Last active March 21, 2023 15:56
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save grepory/13621ee08bbbc0054a2aa38a755720d0 to your computer and use it in GitHub Desktop.
Save grepory/13621ee08bbbc0054a2aa38a755720d0 to your computer and use it in GitHub Desktop.
I'll be omitting syscalls with ellipsis.
The first thing that happens after you press enter is that bash resolves
curl to /usr/bin/curl and then forks(). The strace output is from the
forked process.
Fist execve() the /usr/bin/curl binary and pass an array of arguments to it.
execve("/usr/bin/curl", ["curl", "http://github.com"], [/* 8 vars */]) = 0
...
Execve spawns a process that causes dynamic linking which a good bit of the
output is associated with starting with these open() calls.
open("/etc/ld-musl-x86_64.path", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/local/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = 3
...
Now curl is trying to configure SSL for HTTPS connections, but openssl isn't configured
or even available, so it's not going to work and it aborts the process.
open("/etc/ssl/openssl.cnf", O_RDONLY) = -1 ENOENT (No such file or directory)
brk(NULL) = 0x564432644000
...
Now it will look for the current user's curl startup file.
Since it doesn't exist, it skips that process.
open("/root/.curlrc", O_RDONLY) = -1 ENOENT (No such file or directory)
brk(0x564432683000) = 0x564432683000
...
Now curl is going to try to resolve a hostname.
It is useful to know that curl does its own host resolution and doesn't use
the linux resolver. It will attempt to use /etc/hosts first, then get ready
to use DNS (by reading the contents of /etc/resolv.conf to get a list of
name servers, search domains, and some other options possibly).
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
readv(3, [{"", 0}, {"127.0.0.1\tlocalhost\n::1\tlocalhos"..., 1024}], 2) = 174
readv(3, [{"", 0}, {"127.0.0.1\tlocalhost\n::1\tlocalhos"..., 1024}], 2) = 0
close(3)= 0
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_SETFD, FD_CLOEXEC) = 0
readv(3, [{"", 0}, {"search local\nnameserver 192.168."..., 248}], 2) = 61
readv(3, [{"", 0}, {"search local\nnameserver 192.168."..., 248}], 2) = 0
close(3) = 0
It won't find github.com in the hostname->address mappings in /etc/hosts, so
it will not create a socket and connect to the DNS server to determine the
IP address associated with github.com. socket() is going to return the address
of a socket (a read/write pipe over which to read from and write to the network).
In this case, curl is passing SOCK_DGRAM and IPPROTO_IP to socket() which indicates
that it is doing the DNS query over UDP.
Curl will bind() a network connection to the socket and then send_to() that socket
in order to communicate with the DNS server. It's hard to read the string
representation of the DNS traffic below, but the messages basicaly read in English:
"Give me the A record for github.com"
And the name server at 192.168.65.1 responses (this is just the partial output of `host github.com`:
github.com has address 192.30.253.112
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(3, "\237\356\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, 16) = 28
sendto(3, "\237\356\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.3")}, 16) = 28
sendto(3, "\257\355\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, 16) = 28
sendto(3, "\257\355\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.3")}, 16) = 28
poll([{fd=3, events=POLLIN}], 1, 2500) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "\257\355\201\200\0\1\0\0\0\1\0\0\6github\3com\0\0\34\0\1\300\f\0\6"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, [16]) = 93
recvfrom(3, "\237\356\201\200\0\1\0\1\0\0\0\0\6github\3com\0\0\1\0\1\300\f\0\1"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, [16]) = 44
close(3) = 0
...
Now curl is going to prepare to send an HTTP request to github.com by creaing a socket
and setting some options on that socket (e.g. making the socket read+write). In this
case the socket is TCP (made possible by the SOCK_STREAM option passed to socket and
IPPROTO_TCP).
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
Now it will use connect() to establish a connection with 192.30.253.113, the IP address
associated with github.com. connect is an asynchronous call that binds the resulting
connection to the socket created previously with socket(). It's up to the user to
do a poll wait loop to wait until data is available to be read from the socket. poll()
will return 0 to indicate that there is nothing available to be read after a configurable
timeout passed as the last argument (in this case 0, which indicates to poll that it should
check for data and return immediately if there is no data to be read). poll() will
eventually return 1 that indicates there is data to be read at file descriptor 3 (our socket).
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.30.253.113")}, 16) = -1 EINPROGRESS (Operation in progress)
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7fd4cc9afc4e}, NULL, 8) = 0
...
...
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
...
Now that we're connected to github.com, curl is going to issue an HTTP request. This request
looks like (as shown in curl -v http://github.com):
GET / HTTP/1.1
Host: github.com
User-Agent: curl/7.50.1
Accept: */*
GET / HTTP/1.1
GET indicates that we are issuing a GET request (to get a resource form the web server).
/ is the path of that resource
HTTP/1.1 is the specific version of the HTTP protocol that we are using which will determine
which operations, headers, and modes of transmission the client/server will use/accept.
Host: github.com indicates that we are specifically request the github.com host. HTTP/1.1 allows
the multiplexing of different hostnames on a single IP address (i.e. by a single web server).
User-Agent: curl/7.50.1 indicates that we are using curl v7.50.1 as the program or "agent"
used to connect to the web server.
Accept: */* means that we'll take whatever the default output is for this resource.
sendto() is another asynchronous operation that we will have to poll+wait for.
sendto(3, "GET / HTTP/1.1\r\nHost: github.com"..., 74, MSG_NOSIGNAL, NULL, 0) = 74
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7fd4cc9afc4e}, NULL, 8) = 0
...
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN|POLLRDNORM}])
Now that poll has returned 1, we can read from the socket what data is
available.
recvfrom(3, "HTTP/1.1 301 Moved Permanently\r\nContent-length: 0\r\nLocation: https://github.com/\r\nConnection: close\r\n\r\n", 16384, 0, NULL, NULL) = 103
HTTP/1.1 301 Moved Permanently
Content-length: 0
Location: https://github.com/
Connection: close
It should be noted that the end of the this response is \r\n\r\n which indicates that there
are no more header fields to be read.
HTTP/1.1 301 Moved Permanently indicates that the resource at / on the host github.com
(over HTTP) has been moved permanently to the location shown in the Location header.
Content-length: 0 indicates that there are 0 bytes of contents to read after the header.
Location: https://github.com/ indicates that the content has been relocated to https://github.com/
and, if we wanted to, we could redirect ourselves there to see the content.
Connection: close indicates that the connection should not be re-used after the response is
sent.
curl then starts to shutdown by closing the socket we used to write/read to/from github.com.
close(3) = 0
...
And finally exits 0 indicating success.
exit_group(0) = ?
At this point, the forked process exits and bash resumes execution, returning
a new shell prompt.
# This file tells the executing shell process which binary to use to interpret the
# commands located in this file.
#!/bin/bash
# If anything fails, exit the process and return an error.
set -e
# If a pipe fails, return the exit status of the most recently
# executed process.
set -o pipefail
# Replace the current process with the following. Using exec here will cause none
# of the programs called to have access to the shell. So, for example, we would
# never be able to break here and return to a root shell.
# This first line executes ngrep, the options, explained:
# -P : replace unprintiable characters with whitespace ' '.
# -l : buffer stdout (i.e. do not block waiting for writes to succeed
# to stdout.
# -W single : print all output on a single line, e.g:
# T 54.208.17.239:443 -> 192.168.199.221:57787 [AP] ....P..w...$J...ht..8......&4|.9h..cM.....Lm.w\$.q...'.L,...}...B.P.g.2..............
# -d bond0 : Capture packets on the bond0 device
# -q : be quiet, don't print stupid hash marks indicating that some stuff was found
# but not matched.
# 'SELECT' 'tcp and dst port 3306' first match on the string SELECT and then the second
# string there 'tcp and...' is a bpf (berkeley packet filter) expression that indicates
# we should only match tcp packets with a dst port of 3306 -- in this case we would
# see select queries made to MySQL (but not responses).
exec sudo ngrep -P ' ' -l -W single -d bond0 -q 'SELECT' 'tcp and dst port 3306' |\
# Now we're matching the output of the previous command against an extended regular
# expression (egrep allows support for extended regexes). The will match a substring
# beginning with [AP] (space) then one or more periods and then the word SELECT and a space.
egrep "\[AP\] .\s*SELECT " \|
# Now we're going to strip everything before the SELECT and append a semicolon to the end.
sed -e 's/^T .*\[AP\?\] .\s*SELECT/SELECT/' -e 's/$/;/' \|
# Now we're going to replay those SELECT statements against the production mysql server.
# $1 is the argument we passed to the script when we executed it. If that argument was
# omitted, you would be presented with an inscrutable "host refused" error which that's
# a possible improvement here. anyway, looks like you've got passwordless sudo with access
# to a production mysql server (woof, but get on with your bad self) and then you're going to
# run 16 jobs in parallel. Your stdin pipe is split on newlines and split into jobs sent via stdin to
# the mysql command which will then connect to the github_production database (via the local
# mysql socket). The -f argument says to continue if there are sql errors, so if one select fails
# it'll continue on with the next select. I actually don't know what -ss is. There's -s for
# silent which causes less output to be produced and there's --ssl which says to connect
# via ssl. so i don't know exactly what's goin on there, sorry.
ssh $1 -- 'sudo parallel --recend "\n" -j16 --spreadstdin mysql github_production -f -ss'
@grepory
Copy link
Author

grepory commented Aug 12, 2016

Commenting the script in that way will break it's execution, but i feel it made it more readable. Anyway, it's not meant to still be executable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment