Skip to content

Instantly share code, notes, and snippets.

@YangSeungWon
Last active November 21, 2020 08:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save YangSeungWon/cfd13aca223f5ac5cddd44d30998486a to your computer and use it in GitHub Desktop.
Save YangSeungWon/cfd13aca223f5ac5cddd44d30998486a to your computer and use it in GitHub Desktop.

UIUCTF 2020 - Bot Protection IV

tags: machine learning

[name=whysw@PLUS]

Attachments

Attachments are uploaded on gist and google drive.

Challenge

When on website: +1 spam resistance +10 user annoyance

Gotta be fast! 500 in 10 minutes!

https://captcha.chal.uiuc.tf

Author: tow_nater

As you can see in the comments in index.html, there is captcha.zip file in https://captcha.chal.uiuc.tf/captchas.zip.

<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->

There were 69696 PNG files, with True answer of captcha.


Additionally, these strange characters are Minecraft Enchantment Table Language. ttf file was in https://captcha.chal.uiuc.tf/static/mc.ttf.

It is just one-to-one correspondence with the alphabet, so after doing captcha for about an hour, I became possible to distinguish and type these characters in ~5 seconds. (which is not enough to get FLAG!)

Solution

Machine Learning?

I and my teammates tried hard to find other WEB vulnerabilities, but failed. So we thought that this challenge might be about machine learning...?(even though this chall is in web category) Then, captchas.zip must be dataset for machine learning.

There are 5 characters at once, so I searched Github for Tensorflow code for OCR on more than 2 characters.

https://github.com/JackonYang/captcha-tensorflow

And here it is!


Adapt github code to this challenge

Change Variable

That original code in github is about solving captcha for 4 digits. We are dealing with 5 (alphabet) characters, so changed like below.

Previous:

H, W, C = 100, 120, 3
N_LABELS = 10
D = 4

Changed to:

>H, W, C = 75, 250, 3
N_LABELS = 26
D = 5

Increase Accuracy

At first, we used exactly same layer setting with that code, but that fails at least once in 10 trials.

input_layer = tf.keras.Input(shape=(H, W, C))
x = layers.Conv2D(32, 3, activation='relu')(input_layer)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

x = layers.Flatten()(x)
x = layers.Dense(1024, activation='relu')(x)
# x = layers.Dropout(0.5)(x)

x = layers.Dense(D * N_LABELS, activation='softmax')(x)
x = layers.Reshape((D, N_LABELS))(x)

Improving it, we removed one layer,

x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

but it made the situation worse...

So we added one more layer from the first one!

input_layer = tf.keras.Input(shape=(H, W, C))
x = layers.Conv2D(32, 3, activation='relu')(input_layer)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

The output was awesome. We rarely failed! But this is not the end.


Retrying + Human Learning

We don't have penalty even when we fails. This means we can try again just after we fails. We were able to sort answers by possibility. (because we used softmax)

    im = Image.open(BytesIO(base64.b64decode(data)))
    data = np.array([np.array(np.array(np.array((np.array(im) / 255.0))))])
    y_pred = model.predict_on_batch(data)
    res = tf.math.top_k(y_pred, k=3)
    prob = np.array(res[0][0])
    indices = np.array(res.indices[0])
    l = []

    
    beta = prob[0].size
    beka = prob.size // beta

    for i in range(beka):
        k = []
        for j in range(beta):
            k.append([indices[i][j], prob[i][j]])
        l.append(k)
    wasm = list(product(*l))

    def f(x):
        s = 0
        for i in x:
            s += i[1]
        return s

    res = sorted(wasm, key=f, reverse=True)

This challenge uses session cookie for counting 15 minutes. It means we can open multiple windows with same cookie. So we opened another window and used it in emergency situation.

def send(res):
    for arr in res[:30]:
        trial = ""
        for pair in arr:
            trial += toCh(pair[0])
        r = s.post("https://captcha.chal.uiuc.tf/", data = {"captcha":trial})
        ret = r.text.split('<h2>')[1].split('</h2>')[0]
        print(ret)
        if ret != "Invalid captcha":
            return True
    return False

while True:
    im, res = solve_captcha(get_img())
    if not send(res):
        input("ALEEEEEEEEEEEEEEEEEEEEEERT!!!!!!!!!!!!!")

When it eventually fails after 30 tries, we manually type the answer, and press enter in python in order to continue.

AND WE GOT...

output : uiuctf{i_knew_a_guy_in_highschool_that_could_read_this}

p.s. Now I can read this too! haha - whysw

Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
<!doctype html>
<!--TODO: we don't need /captchas.zip anymore now that we dynamically create captchas. We should delete this file.-->
<html>
<title>UIUCTF</title>
<link rel="stylesheet" href="/static/style.css">
<body>
<div class="bg"/>
<div class=overlay>
<h2>Level 0 is not high enough</h2>
<div class="container">
<img class="enchant" src="/static/img/overlay.png"/>
<img class="captcha" src=""/>
</div>
<form action="?" method="post">
<input type="text" id="captcha" name="captcha" placeholder="Captcha">
<input type="submit" value="Enchant">
</form>
</div>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment