Last active
March 21, 2023 15:56
-
-
Save grepory/13621ee08bbbc0054a2aa38a755720d0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'll be omitting syscalls with ellipsis. | |
The first thing that happens after you press enter is that bash resolves | |
curl to /usr/bin/curl and then forks(). The strace output is from the | |
forked process. | |
Fist execve() the /usr/bin/curl binary and pass an array of arguments to it. | |
execve("/usr/bin/curl", ["curl", "http://github.com"], [/* 8 vars */]) = 0 | |
... | |
Execve spawns a process that causes dynamic linking which a good bit of the | |
output is associated with starting with these open() calls. | |
open("/etc/ld-musl-x86_64.path", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) | |
open("/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) | |
open("/usr/local/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) | |
open("/usr/lib/libcurl.so.4", O_RDONLY|O_CLOEXEC) = 3 | |
... | |
Now curl is trying to configure SSL for HTTPS connections, but openssl isn't configured | |
or even available, so it's not going to work and it aborts the process. | |
open("/etc/ssl/openssl.cnf", O_RDONLY) = -1 ENOENT (No such file or directory) | |
brk(NULL) = 0x564432644000 | |
... | |
Now it will look for the current user's curl startup file. | |
Since it doesn't exist, it skips that process. | |
open("/root/.curlrc", O_RDONLY) = -1 ENOENT (No such file or directory) | |
brk(0x564432683000) = 0x564432683000 | |
... | |
Now curl is going to try to resolve a hostname. | |
It is useful to know that curl does its own host resolution and doesn't use | |
the linux resolver. It will attempt to use /etc/hosts first, then get ready | |
to use DNS (by reading the contents of /etc/resolv.conf to get a list of | |
name servers, search domains, and some other options possibly). | |
open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 3 | |
fcntl(3, F_SETFD, FD_CLOEXEC) = 0 | |
readv(3, [{"", 0}, {"127.0.0.1\tlocalhost\n::1\tlocalhos"..., 1024}], 2) = 174 | |
readv(3, [{"", 0}, {"127.0.0.1\tlocalhost\n::1\tlocalhos"..., 1024}], 2) = 0 | |
close(3)= 0 | |
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 3 | |
fcntl(3, F_SETFD, FD_CLOEXEC) = 0 | |
readv(3, [{"", 0}, {"search local\nnameserver 192.168."..., 248}], 2) = 61 | |
readv(3, [{"", 0}, {"search local\nnameserver 192.168."..., 248}], 2) = 0 | |
close(3) = 0 | |
It won't find github.com in the hostname->address mappings in /etc/hosts, so | |
it will not create a socket and connect to the DNS server to determine the | |
IP address associated with github.com. socket() is going to return the address | |
of a socket (a read/write pipe over which to read from and write to the network). | |
In this case, curl is passing SOCK_DGRAM and IPPROTO_IP to socket() which indicates | |
that it is doing the DNS query over UDP. | |
Curl will bind() a network connection to the socket and then send_to() that socket | |
in order to communicate with the DNS server. It's hard to read the string | |
representation of the DNS traffic below, but the messages basicaly read in English: | |
"Give me the A record for github.com" | |
And the name server at 192.168.65.1 responses (this is just the partial output of `host github.com`: | |
github.com has address 192.30.253.112 | |
socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3 | |
bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 | |
sendto(3, "\237\356\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, 16) = 28 | |
sendto(3, "\237\356\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\1\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.3")}, 16) = 28 | |
sendto(3, "\257\355\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, 16) = 28 | |
sendto(3, "\257\355\1\0\0\1\0\0\0\0\0\0\6github\3com\0\0\34\0\1", 28, MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.3")}, 16) = 28 | |
poll([{fd=3, events=POLLIN}], 1, 2500) = 1 ([{fd=3, revents=POLLIN}]) | |
recvfrom(3, "\257\355\201\200\0\1\0\0\0\1\0\0\6github\3com\0\0\34\0\1\300\f\0\6"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, [16]) = 93 | |
recvfrom(3, "\237\356\201\200\0\1\0\1\0\0\0\0\6github\3com\0\0\1\0\1\300\f\0\1"..., 512, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.65.1")}, [16]) = 44 | |
close(3) = 0 | |
... | |
Now curl is going to prepare to send an HTTP request to github.com by creaing a socket | |
and setting some options on that socket (e.g. making the socket read+write). In this | |
case the socket is TCP (made possible by the SOCK_STREAM option passed to socket and | |
IPPROTO_TCP). | |
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 | |
setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 | |
setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0 | |
setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0 | |
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR) | |
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0 | |
Now it will use connect() to establish a connection with 192.30.253.113, the IP address | |
associated with github.com. connect is an asynchronous call that binds the resulting | |
connection to the socket created previously with socket(). It's up to the user to | |
do a poll wait loop to wait until data is available to be read from the socket. poll() | |
will return 0 to indicate that there is nothing available to be read after a configurable | |
timeout passed as the last argument (in this case 0, which indicates to poll that it should | |
check for data and return immediately if there is no data to be read). poll() will | |
eventually return 1 that indicates there is data to be read at file descriptor 3 (our socket). | |
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.30.253.113")}, 16) = -1 EINPROGRESS (Operation in progress) | |
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout) | |
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7fd4cc9afc4e}, NULL, 8) = 0 | |
... | |
... | |
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}]) | |
... | |
Now that we're connected to github.com, curl is going to issue an HTTP request. This request | |
looks like (as shown in curl -v http://github.com): | |
GET / HTTP/1.1 | |
Host: github.com | |
User-Agent: curl/7.50.1 | |
Accept: */* | |
GET / HTTP/1.1 | |
GET indicates that we are issuing a GET request (to get a resource form the web server). | |
/ is the path of that resource | |
HTTP/1.1 is the specific version of the HTTP protocol that we are using which will determine | |
which operations, headers, and modes of transmission the client/server will use/accept. | |
Host: github.com indicates that we are specifically request the github.com host. HTTP/1.1 allows | |
the multiplexing of different hostnames on a single IP address (i.e. by a single web server). | |
User-Agent: curl/7.50.1 indicates that we are using curl v7.50.1 as the program or "agent" | |
used to connect to the web server. | |
Accept: */* means that we'll take whatever the default output is for this resource. | |
sendto() is another asynchronous operation that we will have to poll+wait for. | |
sendto(3, "GET / HTTP/1.1\r\nHost: github.com"..., 74, MSG_NOSIGNAL, NULL, 0) = 74 | |
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout) | |
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x7fd4cc9afc4e}, NULL, 8) = 0 | |
... | |
poll([{fd=3, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=3, revents=POLLIN|POLLRDNORM}]) | |
Now that poll has returned 1, we can read from the socket what data is | |
available. | |
recvfrom(3, "HTTP/1.1 301 Moved Permanently\r\nContent-length: 0\r\nLocation: https://github.com/\r\nConnection: close\r\n\r\n", 16384, 0, NULL, NULL) = 103 | |
HTTP/1.1 301 Moved Permanently | |
Content-length: 0 | |
Location: https://github.com/ | |
Connection: close | |
It should be noted that the end of the this response is \r\n\r\n which indicates that there | |
are no more header fields to be read. | |
HTTP/1.1 301 Moved Permanently indicates that the resource at / on the host github.com | |
(over HTTP) has been moved permanently to the location shown in the Location header. | |
Content-length: 0 indicates that there are 0 bytes of contents to read after the header. | |
Location: https://github.com/ indicates that the content has been relocated to https://github.com/ | |
and, if we wanted to, we could redirect ourselves there to see the content. | |
Connection: close indicates that the connection should not be re-used after the response is | |
sent. | |
curl then starts to shutdown by closing the socket we used to write/read to/from github.com. | |
close(3) = 0 | |
... | |
And finally exits 0 indicating success. | |
exit_group(0) = ? | |
At this point, the forked process exits and bash resumes execution, returning | |
a new shell prompt. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This file tells the executing shell process which binary to use to interpret the | |
# commands located in this file. | |
#!/bin/bash | |
# If anything fails, exit the process and return an error. | |
set -e | |
# If a pipe fails, return the exit status of the most recently | |
# executed process. | |
set -o pipefail | |
# Replace the current process with the following. Using exec here will cause none | |
# of the programs called to have access to the shell. So, for example, we would | |
# never be able to break here and return to a root shell. | |
# This first line executes ngrep, the options, explained: | |
# -P : replace unprintiable characters with whitespace ' '. | |
# -l : buffer stdout (i.e. do not block waiting for writes to succeed | |
# to stdout. | |
# -W single : print all output on a single line, e.g: | |
# T 54.208.17.239:443 -> 192.168.199.221:57787 [AP] ....P..w...$J...ht..8......&4|.9h..cM.....Lm.w\$.q...'.L,...}...B.P.g.2.............. | |
# -d bond0 : Capture packets on the bond0 device | |
# -q : be quiet, don't print stupid hash marks indicating that some stuff was found | |
# but not matched. | |
# 'SELECT' 'tcp and dst port 3306' first match on the string SELECT and then the second | |
# string there 'tcp and...' is a bpf (berkeley packet filter) expression that indicates | |
# we should only match tcp packets with a dst port of 3306 -- in this case we would | |
# see select queries made to MySQL (but not responses). | |
exec sudo ngrep -P ' ' -l -W single -d bond0 -q 'SELECT' 'tcp and dst port 3306' |\ | |
# Now we're matching the output of the previous command against an extended regular | |
# expression (egrep allows support for extended regexes). The will match a substring | |
# beginning with [AP] (space) then one or more periods and then the word SELECT and a space. | |
egrep "\[AP\] .\s*SELECT " \| | |
# Now we're going to strip everything before the SELECT and append a semicolon to the end. | |
sed -e 's/^T .*\[AP\?\] .\s*SELECT/SELECT/' -e 's/$/;/' \| | |
# Now we're going to replay those SELECT statements against the production mysql server. | |
# $1 is the argument we passed to the script when we executed it. If that argument was | |
# omitted, you would be presented with an inscrutable "host refused" error which that's | |
# a possible improvement here. anyway, looks like you've got passwordless sudo with access | |
# to a production mysql server (woof, but get on with your bad self) and then you're going to | |
# run 16 jobs in parallel. Your stdin pipe is split on newlines and split into jobs sent via stdin to | |
# the mysql command which will then connect to the github_production database (via the local | |
# mysql socket). The -f argument says to continue if there are sql errors, so if one select fails | |
# it'll continue on with the next select. I actually don't know what -ss is. There's -s for | |
# silent which causes less output to be produced and there's --ssl which says to connect | |
# via ssl. so i don't know exactly what's goin on there, sorry. | |
ssh $1 -- 'sudo parallel --recend "\n" -j16 --spreadstdin mysql github_production -f -ss' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Commenting the script in that way will break it's execution, but i feel it made it more readable. Anyway, it's not meant to still be executable.