Skip to content

Instantly share code, notes, and snippets.

@ChunMinChang
Last active December 31, 2018 19:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ChunMinChang/27c7edb4ec45d61a1e8a788888f665cb to your computer and use it in GitHub Desktop.
Save ChunMinChang/27c7edb4ec45d61a1e8a788888f665cb to your computer and use it in GitHub Desktop.
A mistake when using a Rust vector as a buffer to get the data by a C API

Size matters!

The size of vector and array.

In brief

  • The array is fixed-sized. The length of array is fixed and known at compile time. It works like the array in C/C++.
  • The vector is flexible-sized. The length of vector is flexible and not known at compile time. It actually works like a Box<_> type with a pointer pointing to a sequence of data plus a value indicating the length of that sequence. It works like the std::vector in C++.
  • The slice is an array whose length is not known at compile time. It could be a reference to part of an array or a vector.

Quick Test

#include "query.h"  // get_array_size, get_array
#include <assert.h> // assert
#include <stdio.h>  // printf

int main() {
  size_t size = get_array_size();
  uint32_t* data = (uint32_t*) calloc(1, size);
  assert(data);
  get_array(size, data);
  size_t elements = size / sizeof(uint32_t);
  for (size_t i = 0 ; i < elements ; ++i) {
    printf("data[%zu] = %d\n", i, data[i]);
  }
  free(data);
  return 0;
}
#include "query.h"  // get_array_size, get_array
#include <iostream> // std::cout, std::endl
#include <memory>   // std::unique_ptr

int main() {
  size_t size = get_array_size();
  std::unique_ptr<uint8_t[]> data_storage(new uint8_t[size]);
  uint32_t* data = reinterpret_cast<uint32_t*>(data_storage.get());
  get_array(size, data);
  size_t elements = size / sizeof(uint32_t);
  for (size_t i = 0 ; i < elements ; ++i) {
    std::cout << "data[" << i << "] = " << data[i] << std::endl;
  }
  return 0;
}

Can you see the problem when the above C or C++ programs are rewrittrn into the following Rust code ?

mod sys {
    #[link(name = "query")]
    extern "C" {
        pub fn get_array_size() -> usize;

        pub fn get_array(size: usize, data: *mut u32) -> bool;
    }
}

mod utils {
    use std::mem;

    pub fn allocate_array_by_size<T>(size: usize) -> Vec<T> {
        let elements = size / mem::size_of::<T>();
        allocate_array_by_elements::<T>(elements)
    }

    pub fn allocate_array_by_elements<T>(elements: usize) -> Vec<T> {
        let mut array = Vec::<T>::with_capacity(elements);
        unsafe {
            array.set_len(elements);
        }
        array
    }
}

fn main() {
    use std::mem;
    use utils::*;
    use sys::*;

    let size = unsafe {
        get_array_size()
    };
    let mut data = allocate_array_by_size(size);
    unsafe {
        get_array(mem::size_of_val(&data), data.as_mut_ptr());
    }
    for (i, item) in data.iter().enumerate()  {
        println!("data[{}] = {}", i, item);
    }
}

Answer

The answer is that get_array(mem::size_of_val(&data), data.as_mut_ptr()) is a wrong call! and it may mess up the memory!

No matter what size is, the size of the vector(data here) is a fixed since the structure of the vec is fixed. It has a pointer pointing a sequence of data, a length indicationg how long the data is, and other information it needs. You can check its source code here.

It's dangerous to call get_array(mem::size_of_val(&data), data.as_mut_ptr()). get_array(...) will copy the sequence of underlying data whose length is mem::size_of_val(&data) into data.as_mut_ptr(). The size of vec is a fixed value 24, so mem::size_of_val(&data) always returns 24. If the underlying data is larger than or equal to 24, the call works fine. We will allocate a memory that is smaller than or equal to the underlying data, and copy the part of or all the underlying data into there. However, if the the underlying data is smaller than 24, we will allocate a memory that is smaller than 24 but copy 24 bytes of the underlying data into there. That is, we will (over)write data to somewhere we don't know and shouldn't touch!

let size = unsafe {
    get_array_size()
};
let mut data = allocate_array_by_size(size);
unsafe {
    // This is wrong!
    // No matter what size is, the size of the vector(data here) is a fixed
    // value 24, since the structure of the vec is fixed, including len,
    // ptr, ... etc. (See the source code of Vec).
    //
    // This call will mess up the memory if the underlying data is smaller
    // than 24!
    // Suppose the underlying data is 1 byte, then we will dynamically
    // allocate a 1-byte buffer. By the following call, we will write 24
    // bytes from data.as_mut_ptr(). That is, we will write data to
    // somewhere we don't know! We only allocate 1 byte but we write 24
    // bytes!
    get_array(mem::size_of_val(&data), data.as_mut_ptr());
}

Solution

To solve the above problem, the easist way is to replace get_array(mem::size_of_val(&data), data.as_mut_ptr()) by get_array(size, data.as_mut_ptr()) where let size = unsafe { get_array_size() }.

If we only need to take the first NUM elements, we can use a fixed-size array rather than a flexible-size vector to store the data.

const NUM: usize = 2;
if size < NUM {
    return;
}

let mut data = [0_u32; NUM];
unsafe {
    get_array(mem::size_of_val(&data), data.as_mut_ptr());
}

TODO

  • Write some C/C++ and Rust code to show the different sizes of vector and array
all:
# Build a static library
gcc -c -o query.o query.c
ar rcs libquery.a query.o
# Build and run a C sample
gcc sample.c libquery.a -o sample-c
./sample-c
# Build and run a C++ sample
g++ sample.cpp libquery.a -o sample-cpp
./sample-cpp
# Build and run Rust samples
rustc sample-solution.rs -L.
RUST_BACKTRACE=1 LD_LIBRARY_PATH=. ./sample-solution
rustc sample-alternative.rs -L.
RUST_BACKTRACE=1 LD_LIBRARY_PATH=. ./sample-alternative
# Build and run a wrong Rust sample
rustc sample-problem.rs -L.
RUST_BACKTRACE=1 LD_LIBRARY_PATH=. ./sample-problem
clean:
rm query.o
rm libquery.a
rm sample-c
rm sample-cpp
rm sample-problem
rm sample-solution
rm sample-alternative
#include "query.h"
#include <string.h> // memcpy
// const uint32_t DATA[10] = { 3, 1, 4, 1, 5, 9, 2, 6, 5, 3 };
const uint32_t DATA[3] = { 3, 1, 4 };
size_t get_array_size() {
return sizeof(DATA);
}
void get_array(size_t size, uint32_t* data) {
memcpy(data, &DATA, size);
}
#ifndef QUERY_H
#define QUERY_H
#include <stdbool.h> // bool
#include <stdint.h> // uint32_t
#include <stdlib.h> // size_t
#ifdef __cplusplus
extern "C"
{
#endif
// Return the size of the underlying data.
size_t get_array_size();
// Fill the underlying data to the provided buffer.
void get_array(size_t size, uint32_t* data);
#ifdef __cplusplus
}
#endif
#endif // QUERY_H
mod sys {
#[link(name = "query")]
extern "C" {
pub fn get_array_size() -> usize;
pub fn get_array(size: usize, data: *mut u32) -> bool;
}
}
fn main() {
use std::mem;
use sys::*;
let size = unsafe {
get_array_size()
};
// If we only need to take the first `NUM` elements,
// we can use a fixed-size array rather than a flexible-size vector.
const NUM: usize = 2;
if size < NUM {
return;
}
let mut data = [0_u32; NUM];
unsafe {
assert_eq!(mem::size_of_val(&data), mem::size_of::<u32>() * NUM);
get_array(mem::size_of_val(&data), data.as_mut_ptr());
}
for (i, item) in data.iter().enumerate() {
println!("data[{}] = {}", i, item);
}
}
mod sys {
#[link(name = "query")]
extern "C" {
pub fn get_array_size() -> usize;
pub fn get_array(size: usize, data: *mut u32) -> bool;
}
}
mod utils {
use std::mem;
pub fn allocate_array_by_size<T>(size: usize) -> Vec<T> {
let elements = size / mem::size_of::<T>();
allocate_array_by_elements::<T>(elements)
}
pub fn allocate_array_by_elements<T>(elements: usize) -> Vec<T> {
let mut array = Vec::<T>::with_capacity(elements);
unsafe {
array.set_len(elements);
}
array
}
}
fn main() {
use std::mem;
use utils::*;
use sys::*;
let size = unsafe {
get_array_size()
};
let mut data = allocate_array_by_size(size);
unsafe {
// This is wrong!
// No matter what size is, the size of the vector(data here) is a fixed
// value 24, since the structure of the vec is fixed, including len,
// ptr, ... etc. (See the source code of Vec).
//
// This call will mess up the memory if the underlying data is smaller
// than 24!
// Suppose the underlying data is 1 byte, then we will dynamically
// allocate a 1-byte buffer. By the following call, we will write 24
// bytes from data.as_mut_ptr(). That is, we will write data to
// somewhere we don't know! We only allocate 1 byte but we write 24
// bytes!
get_array(mem::size_of_val(&data), data.as_mut_ptr());
}
for (i, item) in data.iter().enumerate() {
println!("data[{}] = {}", i, item);
}
}
mod sys {
#[link(name = "query")]
extern "C" {
pub fn get_array_size() -> usize;
pub fn get_array(size: usize, data: *mut u32) -> bool;
}
}
mod utils {
use std::mem;
pub fn allocate_array_by_size<T>(size: usize) -> Vec<T> {
let elements = size / mem::size_of::<T>();
allocate_array_by_elements::<T>(elements)
}
pub fn allocate_array_by_elements<T>(elements: usize) -> Vec<T> {
let mut array = Vec::<T>::with_capacity(elements);
unsafe {
array.set_len(elements);
}
array
}
}
fn main() {
use std::mem;
use utils::*;
use sys::*;
let size = unsafe {
get_array_size()
};
let mut data = allocate_array_by_size(size);
unsafe {
// The mem::size_of_val(&data) is a fixed value no matter what size is.
// Use `size` to query the data to make sure it always works!
assert_eq!(mem::size_of_val(&data), 24);
assert_ne!(size, mem::size_of_val(&data));
get_array(size, data.as_mut_ptr());
}
for (i, item) in data.iter().enumerate() {
println!("data[{}] = {}", i, item);
}
}
#include "query.h" // get_array_size, get_array
#include <assert.h> // assert
#include <stdio.h> // printf
int main() {
size_t size = get_array_size();
uint32_t* data = (uint32_t*) calloc(1, size);
assert(data);
get_array(size, data);
size_t elements = size / sizeof(uint32_t);
for (size_t i = 0 ; i < elements ; ++i) {
printf("data[%zu] = %d\n", i, data[i]);
}
free(data);
return 0;
}
#include "query.h" // get_array_size, get_array
#include <iostream> // std::cout, std::endl
#include <memory> // std::unique_ptr
int main() {
size_t size = get_array_size();
std::unique_ptr<uint8_t[]> data_storage(new uint8_t[size]);
uint32_t* data = reinterpret_cast<uint32_t*>(data_storage.get());
get_array(size, data);
size_t elements = size / sizeof(uint32_t);
for (size_t i = 0 ; i < elements ; ++i) {
std::cout << "data[" << i << "] = " << data[i] << std::endl;
}
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment