arget13/reee.md

## reee.md

      
    Raw
  

              reee.md
            
          
    PlaidCTF 2020 : reee

Reversing (150 pts)


Story
Tired from all of the craziness in the Inner Sanctum, you decide to venture out to the beach to relax. You doze off in the sand only to be awoken by the loud “reee” of an osprey. A shell falls out of its talons and lands right where your head was a moment ago. No rest for the weary, huh? It looks a little funny, so you pick it up and realize that it’s backwards. I guess you’ll have to reverse it.
Problem details
Hint: The flag format is pctf{$FLAG}. This constraint should resolve any ambiguities in solutions.

Act 1 — Reverse engineering

We are given a x64 stripped ELF:
reee: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=3ccec76609cd013bea7ee34dffc8441bfa1d7181, stripped

Opening it with IDA we can see that its code is quite simple.
unsigned char[551] = { /* a few numbers here */ };
int main(int argc, char** argv, char** envp)
{
    int result; // eax
    int j; // [rsp+18h] [rbp-28h]
    int i; // [rsp+1Ch] [rbp-24h]

    if(argc <= 1)
        puts("need a flag!");
    for(i = 0; i <= 31336; ++i)
    {
      for(j = 0; j <= 551; ++j)
          code[j] = decrypt(code[j]);
    }
    if(((long long (*)(void))code)())
        result = puts("Correct!");
    else
        result = puts("Wrong!");
    return result;
}

unsigned char bss1, bss2;
unsigned char data1[256] = { /* some more data */ }; // Size guessed by IDA
unsigned char decrypt(unsigned char c)
{
    bss2 += data1[++bss1];
    data1[bss1] ^= data1[bss2];
    data1[bss2] ^= data1[bss1];
    data1[bss1] ^= data1[bss2];
    return data1[(unsigned char)(data1[bss1] + data1[bss2])] + c;
}
It looks like it decrypts some data from 0x400526 and then executes it. I honestly couldn't care less about how this decipher works, I need flaaaaggggssss!
This code returns a value which, in case it is 0, main() will print a "Wrong!" string, otherwise a "Correct!" string.
Let's see... the call is made in
.text:00000000004006DB                 call    near ptr code

so
gef➤  b*0x4006db
Breakpoint 1 at 0x4006db

and then
gef➤  r flagplzzzzzzzzzz
Starting program: /home/arget/reee flagplzzzzzzzzzz

Breakpoint 1, 0x00000000004006db in ?? ()
gef➤  si
0x00000000004006e5 in ?? ()
gef➤  x/11i $pc
=> 0x4006e5:    push   rbp
   0x4006e6:    mov    rbp,rsp
   0x4006e9:    call   0x400702
   0x4006ee:    mov    cl,al
   0x4006f0:    mov    eax,0xae5fe432
   0x4006f5:    add    eax,0x51a01bce
   0x4006fa:    inc    eax
   0x4006fc:    jae    0x400706               # Ignore
   0x4006fe:    mov    al,cl
   0x400700:    leave  
   0x400701:    ret    

So here we have a C function, which basically calls another function, makes a nonsense operation, and returns the result of the function previously called. Nonsense operations like that are going to be something usual along all this code, none of those jumps is ever taken because of the flags left by the mentioned nonsense operations, they are there just to tease. [I usually prefer the static analysis, but in this case it is going to be so much easier and quicker the dynamic analysis].
Now let's step into that function
gef➤  x/10i $pc
=> 0x400702:    push   rbx
   0x400703:    push   rdi
   0x400704:    push   rbp
   0x400705:    mov    rbp,rsp
   0x400708:    mov    rdx,rax
   0x40070b:    dec    eax
   0x40070d:    jmp    0x40070e
   0x40070f:    ror    BYTE PTR [rax-0x75],0xfa
   0x400713:    xor    al,al
   0x400715:    xor    ecx,ecx
gef➤  

We can see another standard prologue, copies the content of rax  to rdx, decrements rax and jumps somewhere, and the last three instructions doesn't look like they are going to be executed ever... um btw,rax contains argv[1]. And, just so you know, from now on I'm going to snip the instructions after a jmp.
gef➤  x/s $rax
0x7fffffffe4b6: "flagplzzzzzzzzzz"
gef➤  si 7
0x000000000040070e in ?? ()
gef➤  x/10i $rip
=> 0x40070e:    inc    eax
   0x400710:    mov    rdi,rdx
   0x400713:    xor    al,al
   0x400715:    xor    ecx,ecx
   0x400717:    dec    rcx
   0x40071a:    repnz scas al,BYTE PTR es:[rdi]
   0x40071c:    not    ecx
   0x40071e:    dec    ecx
   0x400720:    mov    eax,0x92185987
   0x400725:    xor    eax,0x32963a85
gef➤  
   0x40072a:    jns    0x4006b7               # Ignore
   0x40072c:    mov    eax,0xc3bc42d9
   0x400731:    xor    eax,0x1ed2c907
   0x400736:    jns    0x400742               # Ignore
   0x400738:    push   0x50
   0x40073a:    pop    r9
   0x40073c:    xor    r8d,r8d
   0x40073f:    cmp    r8d,0x539
   0x400746:    jl     0x400763
   0x400748:    jmp    0x400774

Ok, here happens something. First, the dec eax performed earlier is undone (this is also very usual in this challenge, dec; jmp; inc). The next seven instructions correspond to an optimized call to strlen(argv[1]) —if you need further explaination here you go—, the result is hold in ecx. Then we have another nonsense set of instructions (the jns is never going to happen because that xor sets SF=1), after this the program moves with push;pop the value 0x50 to the register r9 and set to zero the lower dword of r8. Afterwards we have an if, or, more likely (because of the previous xor),  a for(r8d = 0; r8d < 1337; ++r8d). The jl would jump to the body of the alleged for.
gef➤  si 16+19
0x0000000000400763 in ?? ()
gef➤  x/10i $pc
=> 0x400763:    mov    eax,0xe72b5c52
   0x400768:    add    eax,0x18d4a3ae
   0x40076d:    jne    0x400747               # Ignore
   0x40076f:    xor    r11d,r11d
   0x400772:    jmp    0x4007ac

Just ignore the mov;add;jne.  Now it makes zero the lower dword of the r11 register. Another for is incoming? This xor must mean that r11 is another counter...
gef➤  si 5
0x00000000004007ac in ?? ()
gef➤  x/10i $pc
=> 0x4007ac:    dec    eax
   0x4007ae:    jmp    0x4007af
gef➤  si 2
0x00000000004007af in ?? ()
gef➤  x/10i $pc
=> 0x4007af:    inc    eax
   0x4007b1:    cmp    r11d,ecx
   0x4007b4:    jl     0x4007b8
   0x4007b6:    jmp    0x40074a

There! We have another (presumably) for loop, we only have to find the two jmp's that jump backwards to make both suspected loops. In this case we would have a for(r11d = 0; r11d < arglen; ++r11d) (remember we have in ecx the length of our argument).
gef➤  si 3
0x00000000004007b8 in ?? ()
gef➤  x/10i $pc
=> 0x4007b8:    mov    eax,0x2b71234a
   0x4007bd:    add    eax,0xd48edcb6
   0x4007c2:    inc    eax
   0x4007c4:    jae    0x400774
   0x4007c6:    movsxd r10,r11d
   0x4007c9:    mov    eax,0xef8ac85b
   0x4007ce:    xor    eax,0xe71ae312
   0x4007d3:    js     0x4007ff
   0x4007d5:    mov    rbx,rdx
   0x4007d8:    add    rbx,r10
gef➤  
   0x4007db:    mov    eax,0x46d3a79d
   0x4007e0:    add    eax,0xb92c5863
   0x4007e5:    jne    0x4007ee
   0x4007e7:    movsxd r10,r11d
   0x4007ea:    mov    eax,0xf45102a7
   0x4007ef:    xor    eax,0xae9accbc
   0x4007f4:    js     0x40085f
   0x4007f6:    mov    rax,rdx
   0x4007f9:    add    rax,r10
   0x4007fc:    mov    al,BYTE PTR [rax]
gef➤  
   0x4007fe:    xor    al,r9b
   0x400801:    mov    BYTE PTR [rbx],al
   0x400803:    movsxd r10,r11d
   0x400806:    mov    rax,rdx
   0x400809:    add    rax,r10
   0x40080c:    mov    al,BYTE PTR [rax]
   0x40080e:    xor    r9b,al
   0x400811:    mov    eax,0x91ecf78
   0x400816:    add    eax,0xf6e13088
   0x40081b:    inc    eax
gef➤  
   0x40081d:    jae    0x4007be
   0x40081f:    add    r11d,0x1
   0x400823:    mov    eax,0x15690edd
   0x400828:    xor    eax,0xd68c1903
   0x40082d:    jns    0x400888
   0x40082f:    jmp    0x4007ac

I will now comment the code ignoring the (not so) teasing instructions. The final jmp jumps to the dec eax we have seen in the previous to the last snippet of asm, completing that way the loop we were looking for.
   0x4007c6:    movsxd r10,r11d
   0x4007d5:    mov    rbx,rdx
   0x4007d8:    add    rbx,r10

rbx = &argv[1][r11d], where r11d is our counter (did you remember that rdx had argv[1]?).
   0x4007e7:    movsxd r10,r11d
   0x4007f6:    mov    rax,rdx
   0x4007f9:    add    rax,r10
   0x4007fc:    mov    al,BYTE PTR [rax]

Almost same procedure, but this time the pointer gets dereferenced, performing al = argv[1][r11d].
   0x4007fe:    xor    al,r9b
   0x400801:    mov    BYTE PTR [rbx],al

Thereupon, xor that value with r9 (which cointained 0x50) and save it back to &argv[1][r11d]
   0x400803:    movsxd r10,r11d
   0x400806:    mov    rax,rdx
   0x400809:    add    rax,r10
   0x40080c:    mov    al,BYTE PTR [rax]
   0x40080e:    xor    r9b,al

r9b ^= argv[1][r11d],  now r9 has the original value (before the xor) of argv[1][r11d].
   0x40081f:    add    r11d,0x1
   0x40082f:    jmp    0x4007ac

Increment the counter and loop back.
Now, how is the path followed when the program finishes this loop? We are here again, but now we will follow the other way.
gef➤  x/2i 0x4007ac
   0x4007ac:    dec    eax
   0x4007ae:    jmp    0x4007af
gef➤  x/4i 0x4007af
   0x4007af:    inc    eax
   0x4007b1:    cmp    r11d,ecx
   0x4007b4:    jl     0x4007b8
   0x4007b6:    jmp    0x40074a

If the counter has reached the length of our string the jl doesn't jump and the jmp 0x40074a is taken instead.
gef➤  tb *0x4007b6
Temporary breakpoint 3 at 0x4007b6
gef➤  c
Continuing.

Temporary breakpoint 3, 0x00000000004007b6 in ?? ()
gef➤  i r ecx r11d
ecx            0x10                0x10
r11d           0x10                0x10
gef➤  x/i $pc
=> 0x4007b6:    jmp    0x40074a
gef➤  si
0x000000000040074a in ?? ()
gef➤  x/10i $pc
=> 0x40074a:    dec    eax
   0x40074c:    jmp    0x40074d
gef➤  si 2
0x000000000040074d in ?? ()
gef➤  x/10i $pc
=> 0x40074d:    inc    eax
   0x40074f:    add    r8d,0x1
   0x400753:    mov    eax,0x669ec28e
   0x400758:    add    eax,0x99613d72
   0x40075d:    inc    eax
   0x40075f:    jae    0x400724               # Ignore
   0x400761:    jmp    0x40073f

r8d incremented and umm 0x40073f, that address, we have seen it before. Yes, we have now the backward jump for the outer for. Now we have checked that this is a for too, and r8d is its counter.
gef➤  si 7
0x000000000040073f in ?? ()
gef➤  x/10i $pc
=> 0x40073f:    cmp    r8d,0x539
   0x400746:    jl     0x400763
   0x400748:    jmp    0x400774


The counter r8d is incremented and compared against 1337, when this value is reached the execution will flow to 0x400774, outside —finally— of these almost endless loops. The C code for this loop would be something like this
r9d = 0x50;
for(r8d = 0; r8d < 1337; ++r8d)
{
    for(r11d = 0; r11d < arglen; ++r11d)
    {
        argv[1][r11d] ^= r9d;
        r9d ^= argv[1][r11d];
    }
}
Now with the things a lot clearer we can proceed with the rest of the code.
gef➤  tb*0x400748
Temporary breakpoint 4 at 0x400748
gef➤  c
Continuing.

Temporary breakpoint 4, 0x0000000000400748 in ?? ()
gef➤  i r r8
r8             0x539               0x539
gef➤  si
0x0000000000400774 in ?? ()
gef➤  x/10i $pc
=> 0x400774:    mov    eax,0xea3e6566
   0x400779:    xor    eax,0x5f69faeb
   0x40077e:    jns    0x4007b6               # Ignore
   0x400780:    lea    rax,[rip+0x164]        # 0x4008eb
   0x400787:    lea    r8,[rax]
   0x40078a:    mov    eax,0xb5de3358
   0x40078f:    xor    eax,0x459f236e
   0x400794:    jns    0x400797               # Ignore
   0x400796:    dec    eax
   0x400798:    jmp    0x400799

The program gets in r8 the address of some data in .data (addr 0x4008eb)
gef➤  si 10
0x0000000000400799 in ?? ()
gef➤  x/10i $pc
=> 0x400799:    inc    eax
   0x40079b:    push   0x1
   0x40079d:    pop    r10
   0x40079f:    dec    eax
   0x4007a1:    jmp    0x4007a2

Set r10 = 1
gef➤  si 5
0x00000000004007a2 in ?? ()
gef➤  x/10i $pc
=> 0x4007a2:    inc    eax
   0x4007a4:    xor    r9d,r9d
   0x4007a7:    jmp    0x400834

I can see another loop, using r9d as counter, coming.
=> 0x400834:    mov    eax,0x4759bfd0
   0x400839:    add    eax,0xb8a64030
   0x40083e:    inc    eax
   0x400840:    jae    0x400847               # Ignore
   0x400842:    cmp    r9d,ecx
   0x400845:    jl     0x400849
   0x400847:    jmp    0x400850

We can see an if(r9d < arglen), which will in all likelihood be, from a loop for(r9d = 0; r9d < arglen; ++r9d).
gef➤  si 6
0x0000000000400849 in ?? ()
gef➤  x/10i $pc
=> 0x400849:    test   r10b,r10b
   0x40084c:    jne    0x400857
   0x40084e:    jmp    0x40089f

An if(r10b)? Well, we are going straight into it because r10b == 1 != 0
gef➤  si 2
0x000000000040085a in ?? ()
gef➤  x/10i 0x400857
=> 0x400857:    movsxd r11,r9d
   0x40085a:    mov    eax,0x5f9c75ce
   0x40085f:    xor    eax,0x5d98bd96
   0x400864:    js     0x4008a0               # Ignore
   0x400866:    mov    r10,rdx
   0x400869:    add    r10,r11
   0x40086c:    dec    eax
   0x40086e:    jmp    0x40086f
ef➤  si 8
0x000000000040086f in ?? ()
gef➤  x/10i $pc
=> 0x40086f:    inc    eax
   0x400871:    mov    r11b,BYTE PTR [r10]
   0x400874:    mov    eax,0xf8eca7d8
   0x400879:    add    eax,0x7135828
   0x40087e:    jne    0x40081c               # Ignore
   0x400880:    movsxd rax,r9d
   0x400883:    mov    r10,r8
   0x400886:    add    r10,rax
   0x400889:    mov    eax,0x3b4ad836
   0x40088e:    xor    eax,0xf1d7dbea
gef➤  
   0x400893:    jns    0x400818               # Ignore
   0x400895:    mov    al,BYTE PTR [r10]
   0x400898:    cmp    r11b,al
   0x40089b:    je     0x4008c3
   0x40089d:    jmp    0x4008da

Ok, I will remove the incoherences and conditional jumps to see it clearer
   0x400857:    movsxd r11,r9d
   0x400866:    mov    r10,rdx
   0x400869:    add    r10,r11
   0x400871:    mov    r11b,BYTE PTR [r10]
   0x400880:    movsxd rax,r9d
   0x400883:    mov    r10,r8
   0x400886:    add    r10,rax
   0x400895:    mov    al,BYTE PTR [r10]
   0x400898:    cmp    r11b,al
   0x40089b:    je     0x4008c3
   0x40089d:    jmp    0x4008da

We can see it obtains argv[1][r9d], and then *(r8 + r9d) (r8 contains the address of some place in .data, the program LEA'd that address earlier, as you can see scrolling up), and compares them. If they are equal jumps to 0x4008c3, if not, to 0x4008da.
gef➤  x/10 0x4008c3
   0x4008c3:    mov    eax,0x451c2342
   0x4008c8:    add    eax,0xbae3dcbe
   0x4008cd:    jne    0x400901               # Ignore
   0x4008cf:    push   0x1
   0x4008d1:    pop    r10
   0x4008d3:    dec    eax
   0x4008d5:    jmp    0x4008d6
gef➤  x/10 0x4008d6
   0x4008d6:    inc    eax
   0x4008d8:    jmp    0x4008dd
gef➤  x/10 0x4008dd
   0x4008dd:    mov    eax,0x5e82d693
   0x4008e2:    add    eax,0xa17d296d
   0x4008e7:    jne    0x40090a               # Ignore
   0x4008e9:    jmp    0x4008a2
gef➤  x/10 0x4008a2
   0x4008a2:    mov    eax,0x2e4ef210
   0x4008a7:    add    eax,0xd1b10df0
   0x4008ac:    jne    0x40087e               # Ignore
   0x4008ae:    mov    eax,0x1bff10d0
   0x4008b3:    xor    eax,0x9dd00315
   0x4008b8:    jns    0x40091d               # Ignore
   0x4008ba:    add    r9d,0x1
   0x4008be:    jmp    0x400834

This is executed in case both values argv[1][r9d] and *(r8 + r9d) are the same. Let me clean it up for you [so much rubbish...].
   0x4008cf:    push   0x1
   0x4008d1:    pop    r10
   0x4008ba:    add    r9d,0x1
   0x4008be:    jmp    0x400834

And this is just r10 = 1; r9d++;, and that jmp goes back to the condition checking of what I predicted a for.
Now the other path:
gef➤  x/10i 0x4008da
   0x4008da:    xor    r10b,r10b
   0x4008dd:    mov    eax,0x5e82d693
   0x4008e2:    add    eax,0xa17d296d
   0x4008e7:    jne    0x40090a               # Ignore
   0x4008e9:    jmp    0x4008a2
gef➤  x/10i 0x4008a2
   0x4008a2:    mov    eax,0x2e4ef210
   0x4008a7:    add    eax,0xd1b10df0
   0x4008ac:    jne    0x40087e               # Ignore
   0x4008ae:    mov    eax,0x1bff10d0
   0x4008b3:    xor    eax,0x9dd00315
   0x4008b8:    jns    0x40091d               # Ignore
   0x4008ba:    add    r9d,0x1
   0x4008be:    jmp    0x400834

Which translates into
gef➤  x/10i 0x4008da
   0x4008da:    xor    r10b,r10b
   0x4008ba:    add    r9d,0x1
   0x4008be:    jmp    0x400834

This is executed in case they are not the same value. Let me point out that both paths get to 0x4008ba, only changes that the bad path (with the if false) makes 0 the r10b register, while the other makes it 1.
So for this for —haha— we have this impression right now.
r8 = 0x4008eb;
r10b = 1;
for(r9d = 0; r9d < arglen; ++r9d)
{
    if(r10b)
    {
        if(argv[1][r9d] == *(r8 + r9d))
            r10b = 1;
        else
            r10b = 0;
    }
    else
    {
        /* Something, maybe nothing, who knows? Fate knows. */
    }
}
Nothing too weird. Now we have to explore the else path. But, first. a throwback.
gef➤  x/2i $pc
=> 0x40086f:    inc    eax
   0x400871:    mov    r11b,BYTE PTR [r10]

We are still here, in the first few steps inside the if(r10b), all the previous analysis since almost if(r10b) was indeed static. Now, taking into account that we haven't used as argument the flag, the modified argv[1] won't match the data pointed by r8, and therefore we will execute the xor r10b, r10b at some iteration (almost certainly sure the first one) and the next iteration if(r10b) will be false and the program flow will go through the else, so I am setting a breakpoint there
gef➤  tb*0x40084e
Temporary breakpoint 6 at 0x40084e
gef➤  c
Continuing.

Temporary breakpoint 6, 0x000000000040084e in ?? ()
gef➤  si
0x000000000040089f in ?? ()
gef➤  x/10i $pc
=> 0x40089f:    xor    r10b,r10b
   0x4008a2:    mov    eax,0x2e4ef210
   0x4008a7:    add    eax,0xd1b10df0
   0x4008ac:    jne    0x40087e               # Ignore
   0x4008ae:    mov    eax,0x1bff10d0
   0x4008b3:    xor    eax,0x9dd00315
   0x4008b8:    jns    0x40091d               # Ignore
   0x4008ba:    add    r9d,0x1
   0x4008be:    jmp    0x400834

Which pretty much does nothing xd. Sets to 0 the r10b (although if we are here is, in theory, because r10b is already 0), increments the counter and goes back to the top of the for. We could preferably write the for this way
r8 = 0x4008eb;
r10b = 1;
for(r9d = 0; r9d < arglen; ++r9d)
{
    if(!r10b) continue;
    if(argv[1][r9d] == *(r8 + r9d))
        r10b = 1;
    else
        r10b = 0;
}
Let's finish this loop.
   0x400847:    jmp    0x400850
gef➤  tb *0x400847
Temporary breakpoint 7 at 0x400847
gef➤  c
Continuing.

Temporary breakpoint 7, 0x0000000000400847 in ?? ()
gef➤  si
0x0000000000400850 in ?? ()
gef➤  x/10i $pc
=> 0x400850:    mov    al,r10b
   0x400853:    leave  
   0x400854:    pop    rdi
   0x400855:    pop    rbx
   0x400856:    ret

Ahhh we can see the light at the end of the tunnel. This is just a return r10b;, which turns out to be 1 if the modified argv[1] is equal to the bytestring in the binary, or 0 otherwise.
gef➤  si 5
0x00000000004006ee in ?? ()
gef➤  x/10 $pc
=> 0x4006ee:    mov    cl,al
   0x4006f0:    mov    eax,0xae5fe432
   0x4006f5:    add    eax,0x51a01bce
   0x4006fa:    inc    eax
   0x4006fc:    jae    0x400706               # Ignore
   0x4006fe:    mov    al,cl
   0x400700:    leave  
   0x400701:    ret

And back to the first function of the decrypted code. A return cl; where cl holds the value returned from the other function. After the ret, main() prints one string or another depending on the value that this function returned.
Act 2 — Z3 The Theorem Prover

Now that we know how the program behaves, we need to know the result that our string must have after all the alterations that the program makes on it in order to get a desired "Correct!". The aforementioned result is stored in 0x4008eb in the data segment.
gef➤  x/40bx 0x4008eb
0x4008eb:   0x48    0x5f    0x36    0x35    0x35    0x25    0x14    0x2c
0x4008f3:   0x1d    0x01    0x03    0x2d    0x0c    0x6f    0x35    0x61
0x4008fb:   0x7e    0x34    0x0a    0x44    0x24    0x2c    0x4a    0x46
0x400903:   0x19    0x59    0x5b    0x0e    0x78    0x74    0x29    0x13
0x40090b:   0x2c    0x00    0x48    0x89    0xc2    0x48    0x89    0x55

The problem is that we don't know how long our chain should be. Here we see a 0x00, but it could be just a coincidence ¯\_(ツ)_/¯. Well, what do we have after this data? If there is other data stored after this maybe we could make a guess.
.text:0000000000400907                 db 0E5h
.text:0000000000400908                 db 8Dh
.text:0000000000400909                 db 0DDh
.text:000000000040090A                 db  57h ; W
.text:000000000040090B                 db 0D1h
.text:000000000040090C                 db 0F2h
.text:000000000040090D ; ---------------------------------------------------------------------------
.text:000000000040090D
.text:000000000040090D loc_40090D:                             ; CODE XREF: main+92↑j
.text:000000000040090D                 mov     rdx, rax
.text:0000000000400910                 mov     [rbp-18h], rdx
.text:0000000000400914                 cmp     qword ptr [rbp-18h], 0
.text:0000000000400919                 jz      short loc_400927
.text:000000000040091B                 mov     edi, offset aCorrect ; "Correct!"
.text:0000000000400920                 call    _puts
.text:0000000000400925                 jmp     short loc_400931

Ok, so we have, not so much farder, a chunk of main() (ironically the one we want to reach). Therefore we can say that the length of the bytestring can't be greater than 0x40090d − 0x4008eb = 0x22 = 34. Besides
gef➤  x/34bx 0x4008eb
0x4008eb:   0x48    0x5f    0x36    0x35    0x35    0x25    0x14    0x2c
0x4008f3:   0x1d    0x01    0x03    0x2d    0x0c    0x6f    0x35    0x61
0x4008fb:   0x7e    0x34    0x0a    0x44    0x24    0x2c    0x4a    0x46
0x400903:   0x19    0x59    0x5b    0x0e    0x78    0x74    0x29    0x13
0x40090b:   0x2c    0x00

it has the nullbyte right at the end!! The length of the bytestring must be, with all certainty, 34 - 1 = 33.
Now we just need someone who can solve problems this hard. Someone who... who... umm I don't know, I just broke the epic moment... It's Z3, just Z3, The Theorem Prover.
from z3 import *

# How the result must be after all the modifications
flag = [0x48, 0x5f, 0x36, 0x35, 0x35, 0x25, 0x14, 0x2c, \
        0x1d, 0x01, 0x03, 0x2d, 0x0c, 0x6f, 0x35, 0x61, \
        0x7e, 0x34, 0x0a, 0x44, 0x24, 0x2c, 0x4a, 0x46, \
        0x19, 0x59, 0x5b, 0x0e, 0x78, 0x74, 0x29, 0x13, 0x2c]

# This array holds all the variables that z3 must deal with
asd = []
for c in "pctf{": # Making use of the hint to set constraints
    asd.append(BitVecVal(ord(c), 8))
for i in range(len(flag) - 6):
    asd.append(BitVec("%d" % i, 8))
asd.append(BitVecVal(ord('}'), 8))

# Perform the same operations as the binary
r9 = 0x50
for i in range(1337):
    for j in range(len(asd)):
        asd[j] ^= r9
        r9 ^= asd[j]

# Impose conditions
zolv = Solver()
for i in range(len(asd)):
    zolv.add(asd[i] == flag[i])

# Get the solution
assert zolv.check() == sat, "Can't find a solution"
m = zolv.model()
# Figure out how to print the model in order was the hardest part
l = [(d, m[d]) for d in m]
l.sort(key=lambda x:int(str(x[0])))
print("pctf{", end="")
for i in range(len(flag) - 6):
    print(chr(int(str(l[i][1]))), end="")
print("}")
$ time python reee.py 
pctf{ok_nothing_too_fancy_there!}

real    1m32,830s
user    1m31,872s
sys     0m0,475s

$ ./reee pctf{ok_nothing_too_fancy_there!}
Correct!

If the exclamation sign gives you problems in bash, just turn off the history substitution: set +H