Skip to content

Instantly share code, notes, and snippets.

@pythonesque
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pythonesque/35bd8277ae079d81ece6 to your computer and use it in GitHub Desktop.
Save pythonesque/35bd8277ae079d81ece6 to your computer and use it in GitHub Desktop.
#![feature(unsafe_destructor)]
/*
For safety, here are some guarantees we need fulfilled for allocated types for
an allocator of size class <M, N, O>.
* <T>::Size < M
e.g. consider 1-sized struct aligned at 4096.
And suppose we're at the very end of the page here (but it's aligned to 4096):
[ 4096 bytes][4096 bytes]
[1 byte]_____[4096 bytes]
* assume alignment is a power of 2.
* Size is integer multiple of alignment.
MPROT |4096 |4096 |
PAGE |4096 |4096 |4096 |
GUARD NO |YES |?
ALLOC |2048|2048|4096 |
WRITE |1|__|1|__|_______|1|
PAGE |4096 |4096 |4096 |
GUARD NO |YES |?
ALLOC |2048|4096 |4096 |
WRITE |1|__|1|_______|_______|1|
PAGE |4096|4096 |4096 |
GUARD NO |YES |?
ALLOC |1|4096|4096 |
WRITE |1|1|__|_____|1|
* worst case: allocation completely clears the stack guard, can access memory
anywhere.
* worst case if allocations must be smaller than GUARD_SIZE and an allocation
must always have at least one byte accessed on initialization:
|1 |GUARD_SIZE - 1|1|GUARD_SIZE - 2|1 |
STACK |GUARD |?
|ALLOC |ALLOC |
|WRITE| |WRITE|
* The easy fix: set MAX_ALLOC to 1. Then GUARD_SIZE - 2 in this worst-case
would be negative, which is impossible :) This is not very practical, though.
* The slightly less stupid but still easy fix: rewrite the scenario:
|1 |ALLOC_SIZE - 1|GUARD_SIZE + 1 - ALLOC_SIZE|2 * (ALLOC_SIZE - 1) - GUARD_SIZE|1 |
STACK |GUARD |?
|ALLOC |ALLOC |
|WRITE| |WRITE|
Then we require: ALLOC_SIZE - 1 <= GUARD_SIZE <= 2 * (ALLOC_SIZE - 1)
So if ALLOC_SIZE = 0, GUARD_SIZE + 1 < ALLOC_SIZE, or 2 * (ALLOC_SIZE - 1) < GUARD_SIZE
this can't happen. Since we can construct other bad scenarios if GUARD_SIZE + 1 < ALLOC_SIZE,
and ALLOC_SIZE = 0 is useless, the logical requirement is 2 * (ALLOC_SIZE - 1) < GUARD_SIZE.
i.e. GUARD_SIZE = 2 * ALLOC - 1.
By taking alignment into account there may be / certainly are more optimizations than this that
can be made, but this is a good starting point. And the optimizations may not be that important.
Also note: practically speaking we'll want this to be a power of 2... force ALLOC to be a
power-of-two and we can do GUARD_SIZE = ALLOC << 1.
* The more complex fix: figure out how (and if) one can restrict alignment so
this can never happen. Also: consider preferential alignment, reordering allocations, coalescing
lookups (so we only have to bump the pointer once if you want n allocations), etc. Is there a way
to expose this information to LLVM for optimization? That would be sweet.
|1|2|4|8|16|32|1| |
|1| | |
Effectively, can think of all allocations as being of either (align size) or
(max alloc size) and writing to either (byte 0) or (byte max alloc size - 1)
//and all alignments as being (relative to the stack guard) either
// (STACK_GUARD -
Guard page MUST be page aligned and allocations MUST be aligned at min-align,
and all alignments MUST be powers of 2, so we have
BASE + (REAL_SIZE) = GUARD_0
BASE is (k-aligned):
If k < PAGE_SIZE, then
so we have
//
0 G
ALLOC |4096|
Key insight: with power of 2 alignments,
You may not allocate a structure larger than the guard page size.
What if the struct is aligned to 1?
[4096 bytes][4096 bytes]
____[1 byte][4096 bytes]
Still a matter of not allocating a structure larger than the guard page size.
What if alignment is 8?
[4096 bytes][4096 bytes]
[8 bytes]
____[1 byte][4096 bytes]
But you
Next location is:
(foo - 1) + bar
4095 + 4096 = 8191
e.g.
So your guard pages must be at least
* <T>::Alignment == N
* <T>::
*
* The alignment must be the maximum of:
(Align<T>, Size<T>)
2
1968
aligned at 4
MOTIVATION:
Rust is great at managing regions! And Rust is also great at letting you
stack-allocate memory! But when we put it on the heap, we lose a lot of the
advantages of the stack:
* Must allocate / deallocate (slowwwww).
* Different sized types => one thing at a time, have to deal with size classes
and runtime metadata.
* Grab more than one of a type (in a vector) to avoid this cost =>
must bounds check for safety.
But when we put it on the stack, we lose a lot of the advantages of the heap:
* Can't return references to things allocated locally.
* Can use a TypedArena (similar to this), but then we can't box things since
we can't destroy them, leading to overly long-lived regions.
* TypedArenas are also kinda inefficient... can't use them for things in different
size classes. And if you do it's WAY slower (think allocator, not stack),
especially if you start to fragment.
* No way to perform ~safe~ dynamically sized allocations.
* If you don't use a stack guard, you have to pay for checking the SP in every
function call.
It would really be nice to have the convenience (from a lifetimes perspective) of allocating
on the heap, but all the performance of allocating on the stack.
INTRODUCING SECONDARY STACKS
Secondary stacks try to address the above problems by creating a "second" stack that works
almost exactly like a normal stack, with a few important differences:
* NOT guaranteed to live in registers, etc. The stack is managed out of band. Similarly,
LLVM probably won't be able to optimize it very well, so you'll likely have to do that
yourself.
* Frames are NOT automatically created and dropped at entry and exit points; this must be
managed manually. The upside of this is that you can "dynamically allocate" your frame
sizes if you so choose.
* You can box things!
* Frame entry and exit do not run destructors. The region system is relied upon to force
the user to do this.
* There are maximum limits on individual allocations (based on predefined user parameters)
and these are enforced at compile time. This is to prevent jumping the stack guard (see
above).
* Can't reference allocated things as a fixed offset from the stack pointer (maybe you could
if we added restrictions on dynamic allocations... maybe worth offering an alternative that
works like that). So you still use the "real stack" most of the time.
What we don't do: allocation or deallocation (except when the stack is created), require
same-sized types, or bounds checks. Or make it impossible to return an allocated value.
Or pay for dynamic runtime size classes. Or prevent you from destroying values early. Or
prevent you from doing dynamic size allocations. And did I mention, it's safe? :)
HOW DOES IT WORK?
* mmap, mprotect according to calculations above.
* There's basically a single SP at all times.
* Utilizes Rust's region system to ensure safety. Allocating an element returns a Box with the
lifetime of the region and bumps the pointer. No checks needed :)
* (Potentially?) investigate a saner implementation of segmented stacks on SEGV. "Saner" in this
context: perhaps the stack switches directions and starts growing up, instead of down, and the
old stack still sticks arond until you're done. Kind of wasteful of memory but not slow. Or
maybe just do the usual inefficient segmented stacks implementation and treat it as a last
resort. Anyway, it's safe to just abort.
* Likely uses macros: would require a very powerful type system that allowed me to reason on nmbers
to make it safe otherwise, since we have to know that e.g. min alignment is the same for all items.
So it's not QUITE stack-like in that sense (this requirement could probably be avoided somehow though,
I'll have to think about it. Are stack frames aligned anyway? I'd guess they at least might be).
HOPEFULLY this will provide speed and ergonomic improvements for the cases where I'm using TypedArenas
right now.
*/
/*pub struct Region {
}
pub struct ObStack<'a> {
stack: [u8],
}*/
pub struct
fn foo<T, U>(x: T) -> U {
unsafe { return ::std::mem::transmute::<T, U>(x) }
}
fn main() {
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment