The size of vector and array.
In brief
- The array is fixed-sized. The length of array is fixed and known at compile time. It works like the array in C/C++.
- The vector is flexible-sized. The length of vector is flexible and not known at compile time. It actually works like a
Box<_>
type with a pointer pointing to a sequence of data plus a value indicating the length of that sequence. It works like thestd::vector
in C++. - The slice is an array whose length is not known at compile time. It could be a reference to part of an array or a vector.
#include "query.h" // get_array_size, get_array
#include <assert.h> // assert
#include <stdio.h> // printf
int main() {
size_t size = get_array_size();
uint32_t* data = (uint32_t*) calloc(1, size);
assert(data);
get_array(size, data);
size_t elements = size / sizeof(uint32_t);
for (size_t i = 0 ; i < elements ; ++i) {
printf("data[%zu] = %d\n", i, data[i]);
}
free(data);
return 0;
}
#include "query.h" // get_array_size, get_array
#include <iostream> // std::cout, std::endl
#include <memory> // std::unique_ptr
int main() {
size_t size = get_array_size();
std::unique_ptr<uint8_t[]> data_storage(new uint8_t[size]);
uint32_t* data = reinterpret_cast<uint32_t*>(data_storage.get());
get_array(size, data);
size_t elements = size / sizeof(uint32_t);
for (size_t i = 0 ; i < elements ; ++i) {
std::cout << "data[" << i << "] = " << data[i] << std::endl;
}
return 0;
}
Can you see the problem when the above C or C++ programs are rewrittrn into the following Rust code ?
mod sys {
#[link(name = "query")]
extern "C" {
pub fn get_array_size() -> usize;
pub fn get_array(size: usize, data: *mut u32) -> bool;
}
}
mod utils {
use std::mem;
pub fn allocate_array_by_size<T>(size: usize) -> Vec<T> {
let elements = size / mem::size_of::<T>();
allocate_array_by_elements::<T>(elements)
}
pub fn allocate_array_by_elements<T>(elements: usize) -> Vec<T> {
let mut array = Vec::<T>::with_capacity(elements);
unsafe {
array.set_len(elements);
}
array
}
}
fn main() {
use std::mem;
use utils::*;
use sys::*;
let size = unsafe {
get_array_size()
};
let mut data = allocate_array_by_size(size);
unsafe {
get_array(mem::size_of_val(&data), data.as_mut_ptr());
}
for (i, item) in data.iter().enumerate() {
println!("data[{}] = {}", i, item);
}
}
The answer is that get_array(mem::size_of_val(&data), data.as_mut_ptr())
is a wrong call!
and it may mess up the memory!
No matter what size is, the size of the vector(data
here) is a fixed since the structure
of the vec
is fixed. It has a pointer pointing a sequence of data, a length indicationg
how long the data is, and other information it needs. You can check its source code here.
It's dangerous to call get_array(mem::size_of_val(&data), data.as_mut_ptr())
.
get_array(...)
will copy the sequence of underlying data whose length is mem::size_of_val(&data)
into data.as_mut_ptr()
.
The size of vec
is a fixed value 24
, so mem::size_of_val(&data)
always returns 24
.
If the underlying data is larger than or equal to 24
, the call works fine.
We will allocate a memory that is smaller than or equal to the underlying data,
and copy the part of or all the underlying data into there.
However, if the the underlying data is smaller than 24
,
we will allocate a memory that is smaller than 24
but copy 24
bytes of the underlying data into there.
That is, we will (over)write data to somewhere we don't know and shouldn't touch!
let size = unsafe {
get_array_size()
};
let mut data = allocate_array_by_size(size);
unsafe {
// This is wrong!
// No matter what size is, the size of the vector(data here) is a fixed
// value 24, since the structure of the vec is fixed, including len,
// ptr, ... etc. (See the source code of Vec).
//
// This call will mess up the memory if the underlying data is smaller
// than 24!
// Suppose the underlying data is 1 byte, then we will dynamically
// allocate a 1-byte buffer. By the following call, we will write 24
// bytes from data.as_mut_ptr(). That is, we will write data to
// somewhere we don't know! We only allocate 1 byte but we write 24
// bytes!
get_array(mem::size_of_val(&data), data.as_mut_ptr());
}
To solve the above problem, the easist way is to replace
get_array(mem::size_of_val(&data), data.as_mut_ptr())
by
get_array(size, data.as_mut_ptr())
where let size = unsafe { get_array_size() }
.
If we only need to take the first NUM
elements,
we can use a fixed-size array rather than a flexible-size vector to store the data.
const NUM: usize = 2;
if size < NUM {
return;
}
let mut data = [0_u32; NUM];
unsafe {
get_array(mem::size_of_val(&data), data.as_mut_ptr());
}
- Write some C/C++ and Rust code to show the different sizes of
vector
andarray