Skip to content

Instantly share code, notes, and snippets.

@icecr4ck
Last active April 17, 2024 14:22
Show Gist options
  • Star 30 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save icecr4ck/6c744d489efbb07a32bb22e8a3c748e3 to your computer and use it in GitHub Desktop.
Save icecr4ck/6c744d489efbb07a32bb22e8a3c748e3 to your computer and use it in GitHub Desktop.
Some notes about the IDA Microcode (intermediate language).

IDA Microcode notes

References

Implementation

Each Microcode instruction is implemented as a microinstruction (minsn_t class) and its operands as microoperands (mop_t class). A microinstruction contains 3 operands: left (l), right (r) and a destination (d). The effect of a microinstruction is defined by its attribute opcode (instance of mcode_t, see enum at the end). An operand can hold different informations depending on its type (number, register, stack variable etc...).

Example of microinstruction: add eax.4, #8.4, ecx.4 (add constant 8 to EAX and place the result into ECX).

Similarly to the CTREE, each instance of minsn_t can be composed of different mop_t and minsn_t (nested) depending on its complexity.

A Microcode basic block is a double linked list of microinstructions -> instance of mblock_t.

A function Microcode object is a double linked list of microblocks -> instance of mbl_array_t (commonly called mba). It keeps also some info about the decompiled code (local variables in mba->vars for example).

A last class mlist_t is used to keep list of locations (heavily used to represent Use-Def lists)

Different maturity levels

There are 8 different maturity levels (from the lowest to the highest):

  • MMAT_GENERATED: immediately after generation
  • MMAT_PREOPTIMIZED
  • MMAT_LOCOPT: after local optimizations
  • MMAT_CALLS: after analysis of function calls
  • MMAT_GLBOPT1
  • MMAT_GLBOPT2
  • MMAT_GLBOPT3
  • MMAT_LVARS

MMAT_GENERATED

First one is generated from the assembly listing, the syntax is pretty similar to assembly language. There are mov (operands reversed), push/pop for stack operations, und for unimplemented instructions, call and so long... All in all there are 72 microinstructions (to confirm).

The syntax for the operands is a bit different, a suffix corresponding to the size (in bytes) of the operand is added (ex: eax.4, #0xC5E2B1C.4).

Assembly blocks are split when a call instruction is met -> Microcode basic blocks != assembly basic blocks.

A Use-Def pass is also applied on the assembly, per-instruction and per-block.

Example: here is the first block of a function using some obfuscations techniques.

1. 0 mov    esp.4, ebp.4            ; 10018361 u=esp.4      d=ebp.4
1. 1 mov    #0.1, cf.1              ; 10018371 u=           d=cf.1
1. 2 mov    #0.1, zf.1              ; 10018371 u=           d=zf.1
1. 3 mov    #0.1, of.1              ; 10018371 u=           d=of.1
1. 4 und    sf.1                    ; 10018371 u=           d=sf.1
1. 5 und    pf.1                    ; 10018371 u=           d=pf.1
1. 6 mov    #0.1, cf.1              ; 1001837C u=           d=cf.1
1. 7 mov    #0.1, of.1              ; 1001837C u=           d=of.1
1. 8 und    zf.1                    ; 1001837C u=           d=zf.1
1. 9 und    sf.1                    ; 1001837C u=           d=sf.1
1.10 und    pf.1                    ; 1001837C u=           d=pf.1
1.11 mov    ecx.4, esi.4            ; 1001838E u=ecx.4      d=esi.4
1.12 nop                            ; 10018390 u=
1.13 push   esi.4                   ; 10018390 u=esp.4,esi.4 d=esp.4,sp+C.4
1.14 mov    #0x41B21DBA.4, et1.4    ; 10018391 u=           d=et1.4
1.15 mov    et1.4, esi.4            ; 10018391 u=et1.4      d=esi.4
1.16 mov    #0x15CF5AD2.4, et1.4    ; 10018396 u=           d=et1.4
1.17 mov    et1.4, esi.4            ; 10018396 u=et1.4      d=esi.4
1.18 mov    #0x2E960483.4, et1.4    ; 1001839B u=           d=et1.4
1.19 mov    et1.4, esi.4            ; 1001839B u=et1.4      d=esi.4
1.20 mov    #0x1CBD43F8.4, et1.4    ; 100183A0 u=           d=et1.4
1.21 mov    et1.4, esi.4            ; 100183A0 u=et1.4      d=esi.4
1.22 nop                            ; 100183A5 u=
1.23 pop    esi.4                   ; 100183A5 u=esp.4,sp+C.4 d=esp.4,esi.4
1.24 mov    #0x100183AB.4, ett.4    ; 100183A6 u=           d=ett.4
1.25 nop                            ; 100183A6 u=
1.26 nop                            ; 100183A6 u=
1.27 mov    cs.2, seg.2             ; 100183A6 u=cs.2       d=seg.2
1.28 mov    #0x102287B2.4, eoff.4   ; 100183A6 u=           d=eoff.4
1.29 call   $_rand                  ; 100183A6 u=(rax.8,rcx.8,ebp.4,rdi.8,st0.8,st1.8...)

MMAT_PREOPTIMIZED

Use-Def information is used to:

  • delete dead code (if the instruction result is not used, the instruction is deleted)
  • propagate operands and instructions across blocks boundaries
  • generate assertions for future optimizations -> add new microinstructions to represent assertions (assert in comment)

Example: same block as before.

1. 0 mov    esp.4, ebp.4            ; 10018361 u=esp.4      d=ebp.4
1. 1 mov    #0.1, cf.1              ; 1001837C u=           d=cf.1
1. 2 mov    #0.1, of.1              ; 1001837C u=           d=of.1
1. 3 und    zf.1                    ; 1001837C u=           d=zf.1
1. 4 und    sf.1                    ; 1001837C u=           d=sf.1
1. 5 und    pf.1                    ; 1001837C u=           d=pf.1
1. 6 mov    ecx.4, esi.4            ; 1001838E u=ecx.4      d=esi.4
1. 7 call   $_rand                  ; 100183A6 u=(rax.8,rcx.8,ebp.4,rdi.8,st0.8,st1.8...)

2nd example: second block

2. 0 ldx    ds.2, (esi.4+#0x24.4), eax.4 ; 100184B3 u=esi.4,ds.2,(GLBLOW,LVARS,ARGS,GLBHIGH) d=eax.4
2. 1 add    eax.4, #0xF.4, eax.4    ; 100184B6 u=eax.4      d=eax.4
2. 2 mul    #3.4, eax.4, ecx.4      ; 100184B9 u=eax.4      d=ecx.4
2. 3 mov    #0x2AAAAAAB.4, eax.4    ; 100184BC u=           d=eax.4
2. 4 mul    #2.4, ecx.4, ecx.4      ; 100184C1 u=ecx.4      d=ecx.4
2. 5 mul    xds.8(ecx.4), xds.8(eax.4), rt2.8 ; 100184C3 u=eax.4,ecx.4 d=rt2.8
2. 6 mov    rt2^4.4, edx.4          ; 100184C3 u=rt2^4.4    d=edx.4
2. 7 ldx    ds.2, (esi.4+#0x24.4), eax.4 ; 100184C5 u=esi.4,ds.2,(GLBLOW,LVARS,ARGS,GLBHIGH) d=eax.4
2. 8 mov    edx.4, ecx.4            ; 100184C8 u=edx.4      d=ecx.4
2. 9 shr    ecx.4, #0x1F.1, ecx.4   ; 100184CA u=ecx.4      d=ecx.4
2.10 add    edx.4, ecx.4, ecx.4     ; 100184CD u=edx.4,ecx.4 d=ecx.4
2.11 sub    ecx.4, eax.4, ecx.4     ; 100184CF u=eax.4,ecx.4 d=ecx.4
2.12 setb   ecx.4, #0xF.4, cf.1     ; 100184D1 u=ecx.4      d=cf.1
2.13 seto   ecx.4, #0xF.4, of.1     ; 100184D1 u=ecx.4      d=of.1
2.14 setz   ecx.4, #0xF.4, zf.1     ; 100184D1 u=ecx.4      d=zf.1
2.15 setp   ecx.4, #0xF.4, pf.1     ; 100184D1 u=ecx.4      d=pf.1
2.16 sets   (ecx.4-#0xF.4), sf.1    ; 100184D1 u=ecx.4      d=sf.1
2.17 jcnd   ((sf.1 ^ of.1) | zf.1), $loc_100187A8 ; 100184D4 u=zf.1,sf.1,of.1

MMAT_LOCOPT

Propagation pass again with the new Use-Def listing. Define local variables names for stack pointer offsets.

Example: 2nd example from before (1st one is unchanged)

2. 0 high   (#0x2AAAAAAB.8*xds.8((#6.4*([ds.2:(esi.4+#0x24.4)].4+#0xF.4)))), edx.4 ; 100184C3 u=esi.4,ds.2,(GLBLOW,LVARS,ARGS,GLBHIGH) d=edx.4
2. 1 ldx    ds.2, (esi.4+#0x24.4), eax.4 ; 100184C5 u=esi.4,ds.2,(GLBLOW,LVARS,ARGS,GLBHIGH) d=eax.4
2. 2 sub    ((#6.4*([ds.2:(esi.4+#0x24.4)].4+#0xF.4)) /s #6.4), eax.4, ecx.4 ; 100184CF u=eax.4,esi.4,ds.2,(GLBLOW,LVARS,ARGS,GLBHIGH) d=ecx.4
2. 3 setb   ecx.4, #0xF.4, cf.1     ; 100184D1 u=ecx.4      d=cf.1
2. 4 seto   ecx.4, #0xF.4, of.1     ; 100184D1 u=ecx.4      d=of.1
2. 5 setz   ecx.4, #0xF.4, zf.1     ; 100184D1 u=ecx.4      d=zf.1
2. 6 setp   ecx.4, #0xF.4, pf.1     ; 100184D1 u=ecx.4      d=pf.1
2. 7 sets   (ecx.4-#0xF.4), sf.1    ; 100184D1 u=ecx.4      d=sf.1
2. 8 jle    ecx.4, #0xF.4, @6       ; 100184D4 u=ecx.4

It propagates all the instructions from 0 to 6 in 1 instruction, and same for 8 to 11.

MMAT_CALLS

Propagations pass again + ABI is retrieved for calls so arguments can be identified and new assertions can be made.

Example: first block

1. 0 mov    &(%" s").4, ebp.4       ; 10018361 u=           d=ebp.4
1. 1 mov    ecx.4{1}, esi.4{1}      ; 1001838E u=ecx.4      d=esi.4
1. 2 call   $_rand <cdecl:>.0       ; 100183A6 u=(GLBLOW,sp+2E8..,GLBHIGH) d=(cf.1,zf.1...)

Example: second block

jle    (((#6.4*([ds.2{3}:(esi.4{1}+#0x24.4){4}].4{2}+#0xF.4)) /s #6.4)-[ds.2{3}:(esi.4{1}+#0x24.4){4}].4{2}), #0xF.4, @6 

Instruction at 0 was removed because not used anywhere, 1 and 2 have been propagated into 8 instruction. Flags are not used so they were removed as well.

MMAT_GLBOPT1

Blocks are merged (if split because of calls) and function prologue/epilogue are removed.

Example: 2 first blocks

1. 0 mov    ecx.4{1}, esi.4{1}      ; 1001838E u=ecx.4      d=esi.4
1. 1 call   $_rand <cdecl:>.0       ; 100183A6 u=(GLBLOW,GLBHIGH) d=(cf.1,zf.1...)
1. 2 jle    (((#6.4*([ds.2{3}:(esi.4{1}+#0x24.4){4}].4{2}+#0xF.4)) /s #6.4)-[ds.2{3}:(esi.4{1}+#0x24.4){4}].4{2}), #0xF.4, @3

MMAT_GLBOPT2

No differences observed.

MMAT_GLBOPT3

No differences observed.

MMAT_LVARS

SSA pass is applied and registers considered as local variables.

Example: 2 first blocks

1. 0 mov    this.4{1}, esi1.4{1}    ; 1001838E u=ecx.4      d=esi.4
1. 1 call   $_rand <cdecl:>.0       ; 100183A6 u=(GLBLOW,GLBHIGH) d=(cf.1,zf.1...)
1. 2 jle    (((#6.4*([ds.2{3}:(esi1.4{1}+#0x24.4){4}].4{2}+#0xF.4)) /s #6.4)-[ds.2{3}:(esi1.4{1}+#0x24.4){4}].4{2}), #0xF.4, @3

SSA can be observed with suffix number on registers (they will be later renamed with variables names, ECX is renamed to this for C++).

List of Microcode opcodes (from Hex-Rays SDK)

enum mcode_t
{
  m_nop    = 0x00, // nop                       // no operation
  m_stx    = 0x01, // stx  l,    {r=sel, d=off} // store register to memory     *F
  m_ldx    = 0x02, // ldx  {l=sel,r=off}, d     // load register from memory    *F
  m_ldc    = 0x03, // ldc  l=const,     d       // load constant
  m_mov    = 0x04, // mov  l,           d       // move                         *F
  m_neg    = 0x05, // neg  l,           d       // negate
  m_lnot   = 0x06, // lnot l,           d       // logical not
  m_bnot   = 0x07, // bnot l,           d       // bitwise not
  m_xds    = 0x08, // xds  l,           d       // extend (signed)
  m_xdu    = 0x09, // xdu  l,           d       // extend (unsigned)
  m_low    = 0x0A, // low  l,           d       // take low part
  m_high   = 0x0B, // high l,           d       // take high part
  m_add    = 0x0C, // add  l,   r,      d       // l + r -> dst
  m_sub    = 0x0D, // sub  l,   r,      d       // l - r -> dst
  m_mul    = 0x0E, // mul  l,   r,      d       // l * r -> dst
  m_udiv   = 0x0F, // udiv l,   r,      d       // l / r -> dst
  m_sdiv   = 0x10, // sdiv l,   r,      d       // l / r -> dst
  m_umod   = 0x11, // umod l,   r,      d       // l % r -> dst
  m_smod   = 0x12, // smod l,   r,      d       // l % r -> dst
  m_or     = 0x13, // or   l,   r,      d       // bitwise or
  m_and    = 0x14, // and  l,   r,      d       // bitwise and
  m_xor    = 0x15, // xor  l,   r,      d       // bitwise xor
  m_shl    = 0x16, // shl  l,   r,      d       // shift logical left
  m_shr    = 0x17, // shr  l,   r,      d       // shift logical right
  m_sar    = 0x18, // sar  l,   r,      d       // shift arithmetic right
  m_cfadd  = 0x19, // cfadd l,  r,    d=carry   // calculate carry    bit of (l+r)
  m_ofadd  = 0x1A, // ofadd l,  r,    d=overf   // calculate overflow bit of (l+r)
  m_cfshl  = 0x1B, // cfshl l,  r,    d=carry   // calculate carry    bit of (l<<r)
  m_cfshr  = 0x1C, // cfshr l,  r,    d=carry   // calculate carry    bit of (l>>r)
  m_sets   = 0x1D, // sets  l,          d=byte  SF=1          Sign
  m_seto   = 0x1E, // seto  l,  r,      d=byte  OF=1          Overflow of (l-r)
  m_setp   = 0x1F, // setp  l,  r,      d=byte  PF=1          Unordered/Parity  *F
  m_setnz  = 0x20, // setnz l,  r,      d=byte  ZF=0          Not Equal         *F
  m_setz   = 0x21, // setz  l,  r,      d=byte  ZF=1          Equal             *F
  m_setae  = 0x22, // setae l,  r,      d=byte  CF=0          Above or Equal    *F
  m_setb   = 0x23, // setb  l,  r,      d=byte  CF=1          Below             *F
  m_seta   = 0x24, // seta  l,  r,      d=byte  CF=0 & ZF=0   Above             *F
  m_setbe  = 0x25, // setbe l,  r,      d=byte  CF=1 | ZF=1   Below or Equal    *F
  m_setg   = 0x26, // setg  l,  r,      d=byte  SF=OF & ZF=0  Greater
  m_setge  = 0x27, // setge l,  r,      d=byte  SF=OF         Greater or Equal
  m_setl   = 0x28, // setl  l,  r,      d=byte  SF!=OF        Less
  m_setle  = 0x29, // setle l,  r,      d=byte  SF!=OF | ZF=1 Less or Equal
  m_jcnd   = 0x2A, // jcnd   l,         d       // d is mop_v or mop_b
  m_jnz    = 0x2B, // jnz    l, r,      d       // ZF=0          Not Equal      *F
  m_jz     = 0x2C, // jz     l, r,      d       // ZF=1          Equal          *F
  m_jae    = 0x2D, // jae    l, r,      d       // CF=0          Above or Equal *F
  m_jb     = 0x2E, // jb     l, r,      d       // CF=1          Below          *F
  m_ja     = 0x2F, // ja     l, r,      d       // CF=0 & ZF=0   Above          *F
  m_jbe    = 0x30, // jbe    l, r,      d       // CF=1 | ZF=1   Below or Equal *F
  m_jg     = 0x31, // jg     l, r,      d       // SF=OF & ZF=0  Greater
  m_jge    = 0x32, // jge    l, r,      d       // SF=OF         Greater or Equal
  m_jl     = 0x33, // jl     l, r,      d       // SF!=OF        Less
  m_jle    = 0x34, // jle    l, r,      d       // SF!=OF | ZF=1 Less or Equal
  m_jtbl   = 0x35, // jtbl   l, r=mcases        // Table jump
  m_ijmp   = 0x36, // ijmp       {r=sel, d=off} // indirect unconditional jump
  m_goto   = 0x37, // goto   l                  // l is mop_v or mop_b
  m_call   = 0x38, // call   l          d       // l is mop_v or mop_b or mop_h
  m_icall  = 0x39, // icall  {l=sel, r=off} d   // indirect call
  m_ret    = 0x3A, // ret
  m_push   = 0x3B, // push   l
  m_pop    = 0x3C, // pop               d
  m_und    = 0x3D, // und               d       // undefine
  m_ext    = 0x3E, // ext  in1, in2,  out1      // external insn, not microcode *F
  m_f2i    = 0x3F, // f2i     l,    d      int(l) => d; convert fp -> integer   +F
  m_f2u    = 0x40, // f2u     l,    d      uint(l)=> d; convert fp -> uinteger  +F
  m_i2f    = 0x41, // i2f     l,    d      fp(l)  => d; convert integer -> fp e +F
  m_u2f    = 0x42, // i2f     l,    d      fp(l)  => d; convert uinteger -> fp  +F
  m_f2f    = 0x43, // f2f     l,    d      l      => d; change fp precision     +F
  m_fneg   = 0x44, // fneg    l,    d      -l     => d; change sign             +F
  m_fadd   = 0x45, // fadd    l, r, d      l + r  => d; add                     +F
  m_fsub   = 0x46, // fsub    l, r, d      l - r  => d; subtract                +F
  m_fmul   = 0x47, // fmul    l, r, d      l * r  => d; multiply                +F
  m_fdiv   = 0x48, // fdiv    l, r, d      l / r  => d; divide                  +F
#define m_max 0x49 // first unused opcode
};
@Ruturaj4
Copy link

so helpful! thank you!

@Taardisaa
Copy link

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment