Skip to content

Instantly share code, notes, and snippets.

@icyveins7
Last active November 12, 2023 15:11
Show Gist options
  • Save icyveins7/a6b7268cf19b9925d6c24170a5ec1fe0 to your computer and use it in GitHub Desktop.
Save icyveins7/a6b7268cf19b9925d6c24170a5ec1fe0 to your computer and use it in GitHub Desktop.
Intel Performance Primitives Mallocs, Aligned memory and Missing Heap Overflows

IPP and N-byte aligned memory

If you are a user of Intel Performance Primitives (or possibly any other library that performs memory allocation for you), then you may have experienced this issue before: Memory leaks due to heap overflows seem to occur randomly, or rather, only very 'large' leaks seem to crash your program and/or trigger the corresponding errors in checkers like Valgrind/AddressSanitizer.

This, it seems, is entirely due to the fact that IPP explicitly in their documentation says that the memory allocated by their functions, ippMalloc, ippsMalloc, ippsMalloc_L and others, are all built to align the data pointers to a 64-byte boundary.

As one may find after a bit of googling, getting memory aligned to a certain byte boundary requires you to allocate more memory than is necessary; specifically, one allocates more memory by that exact byte alignment. For example, if we require 100 bytes aligned to a 64-byte boundary, then we ask the system for (64 + 100) bytes in total.

NOTE: Technically, you only 63 bytes; the worst case scenario is that the system allocates memory starting from 1 byte after your requested 64-byte alignment, so you have to skip 63 bytes to find the next one. But IPP allocates 64 bytes extra, so I will use that in the following discussion.

            You use 100 bytes starting from here
             ^
0 1 2 3      64  
| | | | .... | | | ......
  ^
  System allocated memory block (100+63 bytes) starting from here

When freeing the memory, one must free the pointer from byte (1), the 163 byte block allocated by the system.

Library Functions Must Allocate One More Thing

This is good if you perform the allocation yourself, as you have access to the original pointer returned by the system.

But if you are a library like IPP, then you may only return the data pointer to the user.

// data points to the start of the 64-byte aligned memory already
Ipp8u *data = ippsMalloc_8u_L(N);

ippsFree(data);

In a scenario like this, you need the library's free function, ippsFree in this case, to know how to go back and free the total memory block.

It turns out for IPP, this means writing the address of the total memory block's start to an 8-byte value before the data pointer. This, in turn, requires an extra 8 bytes to be allocated. So the total allocated by the block is actually requested + 64 + 8. This is a simple way but obviously not the only way of deriving the system's allocated block's starting address.

Are you sure we need that 8 bytes?

You might be wondering whether we actually need to allocate the 8 bytes, since we already allocated 64 bytes extra. The answer is yes, in a select few cases.

Let's consider that the user has requested for just 1 byte, and we ignore the extra 8 bytes so we ask for only 64+1=65 bytes. Remember that the system can allocate starting at any byte, so the following is possible:

            Only available 64-byte aligned pointer
            ^
    62  63  64  65  66          126 127 128 129 130
....|   |   |   |   |   ......  |   |   |   |   |
    ^                           ^
    ^                           System allocation ends here (inclusive)
    System allocation starts from here

Clearly, in the above scenario, there is not enough space to store the 8-byte address before the 64-byte aligned data pointer. Thus, in general, the extra 8 bytes is required.

How This Affects Heap Overflows

The grand total allocated by IPP is hence requested + 72 bytes. Since the 8 bytes is strictly before the data pointer, that means that considering the best scenario now, we could have 64 bytes at the tail end of our requested memory.

This is a lot of bytes! For 64-bit or 8-byte values like double (Ipp64f), we will have up to 8 more elements that can be stored, which means that no leaks will occur or be detected until you write more than 8 elements out of bounds.

For 8-bit or 1-byte values, no leaks until you write 64 elements out of bounds.

Evidence

MSVC, Address Sanitizer

See https://github.com/icyveins7/ipp_ext/blob/master/examples/heap_overflow_example.cpp for source code. But essentially here we only request for 1 byte (1x Ipp8u).

Output:

Size of vector: 1
v[0] = 255
v[1] = 190
=================================================================
==16120==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x127e04aa0089 at pc 0x7ff6230d12ee bp 0x0094962ff7d0 sp 0x0094962ff7d8
READ of size 1 at 0x127e04aa0089 thread T0
    #0 0x7ff6230d12ed in main E:\gitrepos\ipp_ext\examples\heap_overflow_example.cpp:12
    #1 0x7ff6230dc398 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #2 0x7ff6230dc2ed in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #3 0x7ff6230dc1ad in __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #4 0x7ff6230dc40d in mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #5 0x7ff874787343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #6 0x7ff875a826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

0x127e04aa0089 is located 32 bytes to the right of 73-byte region [0x127e04aa0020,0x127e04aa0069)
allocated by thread T0 here:
    #0 0x7ff81133ed19 in __asan_wrap_RtlAllocateHeap D:\a\_work\1\s\src\vctools\asan\llvm\compiler-rt\lib\asan\asan_malloc_win.cpp:1573
    #1 0x7ff84a7e6863 in ippGetLibVersion (E:\Intel\oneAPI\ipp\latest\redist\intel64\ippcore.dll+0x180006863)
    #2 0x7ff84a7e2340 in ippMalloc_L (E:\Intel\oneAPI\ipp\latest\redist\intel64\ippcore.dll+0x180002340)
    #3 0x7ff6230da557 in ippe::vector<unsigned char>::reserve(unsigned __int64) E:\gitrepos\ipp_ext\include\ipp_ext_vec.h:328
    #4 0x7ff6230d54ca in ippe::vector<unsigned char>::vector<unsigned char>(unsigned __int64) E:\gitrepos\ipp_ext\include\ipp_ext_vec.h:52
    #5 0x7ff6230d1100 in main E:\gitrepos\ipp_ext\examples\heap_overflow_example.cpp:5
    #6 0x7ff6230dc398 in invoke_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #7 0x7ff6230dc2ed in __scrt_common_main_seh D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #8 0x7ff6230dc1ad in __scrt_common_main D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:330
    #9 0x7ff6230dc40d in mainCRTStartup D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp:16
    #10 0x7ff874787343  (C:\Windows\System32\KERNEL32.DLL+0x180017343)
    #11 0x7ff875a826b0  (C:\Windows\SYSTEM32\ntdll.dll+0x1800526b0)

SUMMARY: AddressSanitizer: heap-buffer-overflow E:\gitrepos\ipp_ext\examples\heap_overflow_example.cpp:12 in main
Shadow bytes around the buggy address:
  0x04bfc53d3fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04bfc53d3fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04bfc53d3fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04bfc53d3ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x04bfc53d4000: fa fa fa fa 00 00 00 00 00 00 00 00 00 01 fa fa
=>0x04bfc53d4010: fa[fa]fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x04bfc53d4020: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fd fd
  0x04bfc53d4030: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x04bfc53d4040: fd fd fd fd fd fa fa fa fa fa fd fd fd fd fd fd
  0x04bfc53d4050: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x04bfc53d4060: fd fa fa fa fa fa fd fd fd fd fd fd fd fd fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16120==ABORTING

Key notes:

  1. 73 bytes allocated by request 1 byte (1 Ipp8u element).
  2. Data pointer is obviously at 0x127e04aa0040 (the only 64-byte aligned address).
  3. Hence the first access of v[1] does not heap overflow.
  4. In this case, requesting v[41] would have already overflowed (would have landed at 0x...69).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment