Skip to content

Instantly share code, notes, and snippets.

@jvns
Created February 3, 2023 15:22
Show Gist options
  • Save jvns/eea4b0b355083cb7616607440f727350 to your computer and use it in GitHub Desktop.
Save jvns/eea4b0b355083cb7616607440f727350 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "272b3ea3",
"metadata": {},
"source": [
"# Part 1: Build a DNS query\n",
"\n",
"How do we make a query asking for the IP address for `google.com`?\n",
"\n",
"Well, DNS queries have 2 parts: a **header** and a **question**. So we're going to \n",
"\n",
"1. create some Python classes for the header and the question\n",
"2. Write a `to_bytes` function to convert those objects into byte strings\n",
"3. Write a `build_query(domain_name, type)` function that creates a DNS query\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "1ea27c19",
"metadata": {},
"source": [
"## 1.1: Write the `DNSHeader` and `DNSQuestion` classes\n"
]
},
{
"cell_type": "markdown",
"id": "c09faa36",
"metadata": {},
"source": [
"First, our DNS **Header**. This has a query ID, some flags (which we'll mostly ignore), and 4 counts, telling you how many records to expect in each section of a DNS packet. Ignore the `to_bytes` method for now: we'll explain that in a second."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "154b8e28",
"metadata": {},
"outputs": [],
"source": [
"from dataclasses import dataclass\n",
"import dataclasses\n",
"import struct\n",
"\n",
"@dataclass\n",
"class DNSHeader:\n",
" id: int\n",
" flags: int\n",
" num_questions: int = 0\n",
" num_answers: int = 0\n",
" num_authorities: int = 0\n",
" num_additionals: int = 0\n",
"\n",
" def to_bytes(self):\n",
" fields = dataclasses.astuple(self)\n",
" return struct.pack(\"!HHHHHH\", *fields)"
]
},
{
"cell_type": "markdown",
"id": "7e0ae96f",
"metadata": {},
"source": [
"Next, a DNS **Question** just has 3 fields: a name (like `example.com`), a type (like `A`), and a class (which is always the same).\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e0d2dcaf",
"metadata": {},
"outputs": [],
"source": [
"@dataclass\n",
"class DNSQuestion:\n",
" name: bytes\n",
" type: int \n",
" class_: int\n",
"\n",
" def to_bytes(self):\n",
" return self.name + struct.pack(\"!HH\", self.type, self.class_)"
]
},
{
"cell_type": "markdown",
"id": "4311f1bd",
"metadata": {},
"source": [
"Next, let's talk about those `to_bytes` methods that convert the objects into byte\n",
"strings. \n",
"\n",
"## meet `struct.pack`: how we create byte strings\n",
"\n",
"In the `to_bytes` function, we converted our Python objects into a byte string\n",
"using the `struct` module, which is built into Python. \n",
"\n",
"Let's see an example of how `struct` can convert Python variables into byte strings:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f5788594",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x00\\x05\\x00\\x17'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"struct.pack('!HH', 5, 23)"
]
},
{
"cell_type": "markdown",
"id": "8b04e3bd",
"metadata": {},
"source": [
"`H` means \"2-byte integer\", so `!HH` is saying \"format the arguments as two\n",
"2-byte integers. `\\x00\\x05` is 5 and `\\x00\\x17` is 23. \n",
"\n",
"### `struct.pack` format strings\n",
"\n",
"In the format string `\"!HH\"`, there's an `H`, which we just said means \"2 byte integer\". Here are some more examples of things we'll be using later in our format strings:\n",
"\n",
"* `H`: 2 bytes (as an integer)\n",
"* `I`: 4 bytes (as an integer)\n",
"* `4s`: 4 bytes (as a byte string)"
]
},
{
"cell_type": "markdown",
"id": "a42611fe",
"metadata": {},
"source": [
"Here's what an example DNS header looks like converted to bytes:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "03ce8642",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\t\\x19\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"DNSHeader(id=2329, flags=0, num_questions=1, num_additionals=0, num_authorities=0, num_answers=0).to_bytes()"
]
},
{
"cell_type": "markdown",
"id": "dd0c84e0",
"metadata": {},
"source": [
"### a note on byte order\n",
"\n",
"Why is there a `!` at the beginning of the format string `\"!HH\"`? That's\n",
"because anytime you convert an integer into a byte string, there are two\n",
"options for how to do it. Let's see the two ways to convert the integer\n",
"`0x01020304` (16909060) into a 4-byte string:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fedae0e1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x04\\x03\\x02\\x01'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int.to_bytes(0x01020304, length=4, byteorder='little')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "9d79f397",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x01\\x02\\x03\\x04'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int.to_bytes(0x01020304, length=4, byteorder='big')"
]
},
{
"cell_type": "markdown",
"id": "05707fcd",
"metadata": {},
"source": [
"These are the reversed versions of each other. `b'\\x01\\x02\\x03\\x04'` is the\n",
"\"little endian\" version and `b'\\x04\\x03\\x02\\x01'` is the \"big endian\" version. \n",
"\n",
"The names \"little-endian\" and \"big endian\" actually have a funny origin:\n",
"they're named after two satirical religious sects in Gulliver's Travels. One\n",
"sect liked to break eggs on the little end, and the other liked the big end.\n",
"They're named after this Gulliver's travels debate because people used to like\n",
"to argue a lot about which byte order was best but it didn't make a big\n",
"difference.\n",
"\n",
"In network packets, integers are always encoded in a big endian way (though\n",
"little endian is the default in most other situations). So `!` is telling\n",
"Python \"use the byte order for computer networking\".\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "3072fb29",
"metadata": {},
"source": [
"## 1.2: encode the name\n"
]
},
{
"cell_type": "markdown",
"id": "e3ebd263",
"metadata": {},
"source": [
"Now we're ready to build our DNS query.\n",
"\n",
"First, we need to encode the domain name. We don't literally send \"google.com\",\n",
"instead it gets translated into `b\"\\x06google\\x03com\\x00\"`. Here's the code:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b68bdbf7",
"metadata": {},
"outputs": [],
"source": [
"def encode_dns_name(domain_name):\n",
" encoded = b\"\"\n",
" for part in domain_name.encode(\"ascii\").split(b\".\"):\n",
" encoded += bytes([len(part)]) + part\n",
" return encoded + b\"\\x00\""
]
},
{
"cell_type": "markdown",
"id": "0f460e06",
"metadata": {},
"source": [
"Let's run it:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ab4b4783",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x06google\\x03com\\x00'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"encode_dns_name(\"google.com\")"
]
},
{
"cell_type": "markdown",
"id": "ed98ba8c",
"metadata": {},
"source": [
"The first byte of the output is `6` (the length of `\"google\"`):"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e3d060af",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"encode_dns_name(\"google.com\")[0]"
]
},
{
"cell_type": "markdown",
"id": "e85aa41b",
"metadata": {},
"source": [
"## 1.3: build the query"
]
},
{
"cell_type": "markdown",
"id": "a33313ba",
"metadata": {},
"source": [
"Finally, let's write our `build_query` function! Our function takes a domain name (like\n",
"`google.com`) and the number of a DNS record type (like `A`). \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b6d44aef",
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"\n",
"TYPE_A = 1\n",
"CLASS_IN = 1\n",
"\n",
"def build_query(domain_name, record_type):\n",
" name = encode_dns_name(domain_name)\n",
" id = random.randint(0, 65535)\n",
" RECURSION_DESIRED = 1 << 8\n",
" header = DNSHeader(id=id, num_questions=1, flags=RECURSION_DESIRED)\n",
" question = DNSQuestion(name=name, type=record_type, class_=CLASS_IN)\n",
" return header.to_bytes() + question.to_bytes()"
]
},
{
"cell_type": "markdown",
"id": "852803f1",
"metadata": {},
"source": [
"This:\n",
"\n",
"1. Defines some constants (`TYPE_A = 1`, `CLASS_IN = 1`)\n",
"2. encodes the DNS name with `encode_dns_name`\n",
"3. picks a random ID for the query\n",
"4. sets the flags to \"recursion desired\" (which you need to set any time you're talking to a DNS resolver)\n",
"5. creates the question\n",
"6. concatenates the header and the question together\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "f65b6316",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\x81\\x18\\x01\\x00\\x00\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x07example\\x03com\\x00\\x00\\x01\\x00\\x01'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"build_query(\"example.com\", TYPE_A)"
]
},
{
"cell_type": "markdown",
"id": "f43f8e07",
"metadata": {},
"source": [
"## 1.4: Test our code\n",
"\n",
"Now let's test if our code works!"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "37bc5241",
"metadata": {},
"outputs": [],
"source": [
"import socket\n",
"\n",
"query = build_query(\"www.example.com\", 1)\n",
"sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)\n",
"sock.sendto(query, (\"8.8.8.8\", 53))\n",
"response, _ = sock.recvfrom(1024)"
]
},
{
"cell_type": "markdown",
"id": "a43c448c",
"metadata": {},
"source": [
"This sends a query to Google's DNS resolver asking where `www.example.com` is.\n",
"\n",
"But how can we know that this worked if we don't know how to parse the response\n",
"yet? Well can run `tcpdump` to see our program making its DNS query:\n",
"\n",
"```\n",
"$ sudo tcpdump -ni any port 53\n",
"08:31:19.676059 IP 192.168.1.173.62752 > 8.8.8.8.53: 45232+ A? www.example.com. (33)\n",
"08:31:19.694678 IP 8.8.8.8.53 > 192.168.1.173.62752: 45232 1/0/0 A 93.184.216.34 (49)\n",
"```\n",
"\n",
"It worked! You can see `8.8.8.8`'s answer at the end of tcpdump's output here, at the end of the second line. \n",
"\n",
"Asking Google's DNS resolver here is cheating, of course -- our final goal is\n",
"to **write** a DNS resolver that finds out where `example.com` is ourself,\n",
"instead of asking `8.8.8.8` to do the work for us. But this is a nice easy way\n",
"to check that our code for building a DNS query works."
]
},
{
"cell_type": "markdown",
"id": "f69c24d6",
"metadata": {},
"source": [
"## Success!\n",
"\n",
"In the next part, we'll see how to parse this DNS response we just got back:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "74c2faa9",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"b'\\xc3c\\x81\\x80\\x00\\x01\\x00\\x01\\x00\\x00\\x00\\x00\\x03www\\x07example\\x03com\\x00\\x00\\x01\\x00\\x01\\xc0\\x0c\\x00\\x01\\x00\\x01\\x00\\x00L\\xfc\\x00\\x04]\\xb8\\xd8\"'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"response"
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,md:myst"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment