chinmaydd/segmem.md Secret

## segmem.md

      
    Raw
  

              segmem.md
            
          
    SegMem

Idea
In rune, we initially only had 64 bit opertion support for symbolic operations. One of the core point in my proposal was to overcome this deficiency so that we can overcome a broader range of problems. Once we moved Memory into a new trait, it became convenient to add different implementations. This also meant that users could experiment with the overall architecture and use work on their own ideas.
Now, the initial idea was to memory blocks of size 1 byte and create a new symbolic variable for each. This allows us to track operations on the memory byte-by-byte. Once you have such fine grained control over the memory, memory access operations become fairly intuitive to understand and implement.
A possible shortcoming of this approach which I noticed was that we were not leveraging the access patterns in the binary. For example if we have a read/write of exactly 4 bytes at address 0x9000 throughout the course of execution we are not using this pseudo type information. I decided to experiment with it.
Implementation
We define what is known as a Memory Range as a structure which holds the start and end address of a symbolic memory segment. Now, a Memory Block is a structure which contains a Memory Range and it's corresponding solver representation (here, a NodeIndex which references a node in the constraint tree). These MemBlocks are arranged in a BTreeMap with their start address as the comparison element for ordering them in the tree. I will try and explain the process flow with a diagram (assume an empty memory instance):
1.  Input: Read for 2 bytes at memory 0x9000
    Memory:
    -------------------
    | 0x9000 - 0x9002 |
    ------------------- 
    
2.  Input: Read for 2 bytes at memory adress 0x9003
    Memory:
    -------------------    -------------------
    | 0x9000 - 0x9002 |    | 0x9003 - 0x9005 |
    -------------------    -------------------   
    
3.  Input: Access for 4 bytes at memory address 0x90001
    Memory:
    -------------------           -------------------
    | 0x9000 - 0x9002 |           | 0x9003 - 0x9005 |
    -------------------           -------------------   
              ^       ^ ^       ^ ^       ^
              |       | |       | |       |
              -- bv1 -- -- bv2 -- -- bv3 --
             
    Here,
    bv1: Result of extract operation on bitvector representing memory at 0x9000.
    bv2: Newly created bitvector representing memory of length 1 byte at 0x9002.
    bv3: Result of extract operation on bitvector representing memory at 0x9003.
    
    * A result bitvector of size equal to the access size is created 
      and is initialized to zero.
    
    * bv1, bv2 and bv3 are zero extended to match the access size 
      (i.e 4 bytes or 32 bits).
    
    * Depending on the read offset, 
      these bitvectors are then left shifted appropriately.
    
    * The result is then basically the logical OR of all the 4 quantities:
      (We could also use XOR since there are no overlaps)
      result = result ^ bv1 ^ bv2 ^ bv3

Specifics


In the BTreeMap, the key is the MemoryRange and the value is the Memory Block. Since we order based on the start address of the range, the search is always O(log n) in time. This helps query time and hence the result is returned faster.


We also make sure to return the already existing node if the read matches exact size of the so as to avoid unncessary extract and OR operations.


This implementation is very new right now and there could definitely be better ways to handle this. I'm all ears for modifications and improvements! You can find the relevant file here. Since there are no papers as such which describe how to model memory, I took inspiration from existing open source engines.