willwam845/uiuctf-2021-survey.md Secret

## uiuctf-2021-survey.md

      
    Raw
  

              uiuctf-2021-survey.md
            
          
    UIUCTF 2021 - Feedback Survey

Intro

I didn't have much time to play CTF this weekend due to having to prepare for RaRCTF, but I did get a chance to check out two awesome CTFs this weekend, CryptoCTF and UIUCTF. In this writeup, I'll cover how I solved the "Feedback Survey" challenge, a 1 point "meta" challenge from UIUCTF, tagged as extreme. In the end, 167 teams managed to solve this challenge, with our team solving it 5th.
challenge


Fill out This feedback form! We would love to hear your input so we can continue to improve!

initial analysis

Similar to the "Feedback Survey" challenge from UIUCTF 2020, we are presented with a Google Form, however there is no source code provided (sad!), so we will have to do some initial recon.
On looking at the source code for the page, we notice something very different from last year. Google have changed it so that all the data for the survey is now compressed down onto one line at the end of the survey. This makes extracting the data from it possible, but still very difficult.
failed ideas

A common technique to solve these survey challenges is to just do the survey itself. However, this idea is often quite slow, and often leads to having to do this manually, which is time consuming. I want to find a solution that works, is somewhat fast, and does everything for me once implemented. Doing a survey this way only satisfies the first condition.
A way to satisfy the second condition is to use our classic known plaintext survey blood attack. This technically should work, as we still have all of the public survey data in the link. However, since all the data is all on one line, we still have to manually search through this one line to find the flag, and even then, all the KPBSA does at that point is to figure out what line that is (which should be quite obvious from just looking at the source code itself).
To solve this problem however, we can use something called regex. Regex (short for regular expression) specifies a search pattern to look for in a piece of data. In this case, we are looking for something that satisfies the flag format, which, in comparison to last year, has been conveniently been provided for us (thanks organizers!).
Q: What is the standard flag format?
A: uiuctf{flag}

Perfect. So now we want to construct a regex to search for strings that look like this in our line of survey data. However, we'll need to figure out how to do this.
regex shenanigans

To help, I'm going to be using a useful tool called regexr. Regexr allows us to specify a regex, and then create test cases for each to see if they pass the regex or not.
For our case, we only need to create a simple regex, one that looks for uiuctf{, followed by some text, and then followed by a }.
Since we don't know what characters will be in the flag, we need to potentially search for every single character. We can do this by specifying a "character set", which basically tells the regex engine to match if the character is any one of the characters in that set. For example:
if my regex was grade: [ABCDEF]
grade: A would match because A is part of the character set
grade: D would match because D is part of the character set
grade: H would not match because H is not part of the character set
So, we have a way to search for a character at any position. However, we don't know how many characters will be in the flag. We could do this a generic way by simply pasting our control group multiple times, and then bruteforcing the number of characters in the flag, but that's inefficient, so we are going to use another thing regex has, which is the * character.
If we look at the python regex documentation, we see this:

Repeating Things
Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times.
The first metacharacter for repeating things that we’ll look at is *. * doesn’t match the literal character '*'; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once.
For example, ca*t will match 'ct' (0 'a' characters), 'cat' (1 'a'), 'caaat' (3 'a' characters), and so forth.

So this allows us to repeat characters (and also control groups). Cool! That means we can combine the two to create a better regex, which won't need the bruteforcing of the flag length. Our regex will then be of the form uiuctf{[control-group]*}, which means we will keep matching stuff inside the control group until we find a character that isn't in the control group, meaning that the next character has to be a } for the regex to match.
But wait, we still haven't decided on our control group yet. Ideally, we should pick all characters, as we don't know what characters will be in the flag.
First, we can start with a very generic approach, simply pasting in all the characters into the square brackets, which looks something like: uiuctf{[abcdefghijklmnopqrstuvwyxzABCDE]}... you get the idea. This works, but is not only very time consuming, but is also inefficient and annoying to do. We can do much better.
Looking at the python regex docs again:

The first metacharacters we’ll look at are [ and ]. They’re used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'. For example, [abc] will match any of the characters a, b, or c; this is the same as [a-c], which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your RE would be [a-z].

So, we can use the - character to specify ranges. For example, you can have a control group like [0-9], which will only match digits 0-9, and you don't have to write out every single digit.
Using this, we can then specify all bytes from \x00 to \xff to be searched for in our regex. Then, our regex looks like uiuctf{[\x00-\xff]*}
Much better. Can we go even better though?

The final metacharacter in this section is .. It matches anything except a newline character, and there’s an alternate mode (re.DOTALL) where it will match even a newline. . is often used where you want to match “any character”.

So the . character matches everything. This is great, and is exactly what we need, because we want to match all bytes! We can replace our entire control group with this character, and now our regex is....
uiuctf{.*}
Awesome. Let's try get the flag!
We'll reuse a bit of our old code, then replace our URL to hopefully find a match in the html.
Implementation:
import requests
import re # regex library

url = "https://forms.gle/pzwBXxdmRob885wG7"
text = requests.get(url).text
results = re.findall("uiuctf{.*}", text)

print(results)
Running this, we get a hit.
Output:

['uiuctf{your_input_is_important_to_us_\u003c3}']

Oh cool, that looks like a flag! But wait... CTFD isn't accepting it... what's going on here?
Admittedly, this is where I spent a lot of my time.
who's that character?

We seem to have some special character... but what character is it?
To know what this character(s?) is, we need to understand what the format is. Google Forms escapes a lot of special characters (presumably to patch other vulnerabilities), and so we need to figure out what that character is.
If we search it up, we can see that the format \u0000 represents unicode encoding, with the digits being hex digits. Cool, but wait, why do we have 5 hexadecimal digits? Well that's because I'm silly and I didn't realise that 3 at the end was an actual 3. We can tell this because there should only be 4 hexadecimal digits, the rest is simply just normal characters.
So, now that we know the characters are \u003c and a "3", we can now look at what this unicode character is. Googling it, we get:

It's a unicode character. In this case \u003C and \u003E mean : U+003C < Less-than sign. U+003E > Greater-than sign. See a list here.

Cool, so we know it's just a unicode <. We can then put this back into our original flag we got to get:
Flag: uiuctf{your_input_is_important_to_us_<3}

Whew. What a challenge. Thanks to the admins for this creative challenge, and hopefully UIUCTF 2022 can be just as amazing!