pythonesque/second_stack.rs

## second_stack.rs
#![feature(unsafe_destructor)]

/*
For safety, here are some guarantees we need fulfilled for allocated types for
an allocator of size class <M, N, O>.

* <T>::Size < M

e.g. consider 1-sized struct aligned at 4096.

And suppose we're at the very end of the page here (but it's aligned to 4096):

[ 4096 bytes][4096 bytes]
[1 byte]_____[4096 bytes]

* assume alignment is a power of 2.

* Size is integer multiple of alignment.

MPROT |4096     |4096     |

PAGE  |4096     |4096     |4096  |
GUARD  NO       |YES      |?
ALLOC |2048|2048|4096     |
WRITE |1|__|1|__|_______|1|

PAGE  |4096     |4096     |4096  |
GUARD  NO       |YES      |?
ALLOC |2048|4096     |4096     |
WRITE |1|__|1|_______|_______|1|

PAGE  |4096|4096  |4096  |
GUARD  NO  |YES   |?
ALLOC |1|4096|4096   |
WRITE |1|1|__|_____|1|

* worst case: allocation completely clears the stack guard, can access memory
anywhere.

* worst case if allocations must be smaller than GUARD_SIZE and an allocation
must always have at least one byte accessed on initialization:

|1    |GUARD_SIZE - 1|1|GUARD_SIZE - 2|1    |
STACK |GUARD           |?
|ALLOC               |ALLOC                 |
|WRITE|                               |WRITE|

* The easy fix: set MAX_ALLOC to 1.  Then GUARD_SIZE - 2 in this worst-case
  would be negative, which is impossible :)  This is not very practical, though.

* The slightly less stupid but still easy fix: rewrite the scenario:

|1    |ALLOC_SIZE - 1|GUARD_SIZE + 1 - ALLOC_SIZE|2 * (ALLOC_SIZE - 1) - GUARD_SIZE|1    |
STACK |GUARD                                     |?
|ALLOC               |ALLOC                                                              |
|WRITE|                                                                            |WRITE|

Then we require: ALLOC_SIZE - 1 <= GUARD_SIZE <= 2 * (ALLOC_SIZE - 1)

So if ALLOC_SIZE = 0, GUARD_SIZE + 1 < ALLOC_SIZE, or 2 * (ALLOC_SIZE - 1) < GUARD_SIZE
this can't happen.  Since we can construct other bad scenarios if GUARD_SIZE + 1 < ALLOC_SIZE,
and ALLOC_SIZE = 0 is useless, the logical requirement is 2 * (ALLOC_SIZE - 1) < GUARD_SIZE.
i.e. GUARD_SIZE = 2 * ALLOC - 1.

By taking alignment into account there may be / certainly are more optimizations than this that
can be made, but this is a good starting point.  And the optimizations may not be that important.

Also note: practically speaking we'll want this to be a power of 2... force ALLOC to be a
power-of-two and we can do GUARD_SIZE = ALLOC << 1.

* The more complex fix: figure out how (and if) one can restrict alignment so
  this can never happen.  Also: consider preferential alignment, reordering allocations, coalescing
  lookups (so we only have to bump the pointer once if you want n allocations), etc.  Is there a way
  to expose this information to LLVM for optimization?  That would be sweet.
|1|2|4|8|16|32|1|           |
|1|           |             |

Effectively, can think of all allocations as being of either (align size) or
(max alloc size) and writing to either (byte 0) or (byte max alloc size - 1)
//and all alignments as being (relative to the stack guard) either
// (STACK_GUARD -

Guard page MUST be page aligned and allocations MUST be aligned at min-align,
and all alignments MUST be powers of 2, so we have

BASE + (REAL_SIZE) = GUARD_0
BASE is (k-aligned):

If k < PAGE_SIZE, then

so we have
//

0             G


ALLOC |4096|

Key insight: with power of 2 alignments,

You may not allocate a structure larger than the guard page size.

What if the struct is aligned to 1?

[4096 bytes][4096 bytes]
____[1 byte][4096 bytes]

Still a matter of not allocating a structure larger than the guard page size.

What if alignment is 8?

[4096 bytes][4096 bytes]
   [8 bytes]
____[1 byte][4096 bytes]


But you

Next location is:
(foo - 1) + bar
4095 + 4096 = 8191
e.g.
So your guard pages must be at least
* <T>::Alignment == N
* <T>::
*

* The alignment must be the maximum of:
    (Align<T>, Size<T>)

2
1968

aligned at 4


MOTIVATION:

Rust is great at managing regions!  And Rust is also great at letting you
stack-allocate memory!  But when we put it on the heap, we lose a lot of the
advantages of the stack:
  * Must allocate / deallocate (slowwwww).
  * Different sized types => one thing at a time, have to deal with size classes
    and runtime metadata.
  * Grab more than one of a type (in a vector) to avoid this cost =>
    must bounds check for safety.

But when we put it on the stack, we lose a lot of the advantages of the heap:
    * Can't return references to things allocated locally.
    * Can use a TypedArena (similar to this), but then we can't box things since
      we can't destroy them, leading to overly long-lived regions.
    * TypedArenas are also kinda inefficient... can't use them for things in different
      size classes.  And if you do it's WAY slower (think allocator, not stack),
      especially if you start to fragment.
    * No way to perform ~safe~ dynamically sized allocations.
    * If you don't use a stack guard, you have to pay for checking the SP in every
      function call.

It would really be nice to have the convenience (from a lifetimes perspective) of allocating
on the heap, but all the performance of allocating on the stack.

INTRODUCING SECONDARY STACKS

Secondary stacks try to address the above problems by creating a "second" stack that works
almost exactly like a normal stack, with a few important differences:
* NOT guaranteed to live in registers, etc.  The stack is managed out of band.  Similarly,
  LLVM probably won't be able to optimize it very well, so you'll likely have to do that
  yourself.
* Frames are NOT automatically created and dropped at entry and exit points; this must be
  managed manually.  The upside of this is that you can "dynamically allocate" your frame
  sizes if you so choose.
* You can box things!
* Frame entry and exit do not run destructors.  The region system is relied upon to force
  the user to do this.
* There are maximum limits on individual allocations (based on predefined user parameters)
  and these are enforced at compile time.  This is to prevent jumping the stack guard (see
  above).
* Can't reference allocated things as a fixed offset from the stack pointer (maybe you could
  if we added restrictions on dynamic allocations... maybe worth offering an alternative that
  works like that).  So you still use the "real stack" most of the time.

What we don't do: allocation or deallocation (except when the stack is created), require
same-sized types, or bounds checks.  Or make it impossible to return an allocated value.
Or pay for dynamic runtime size classes.  Or prevent you from destroying values early.  Or
prevent you from doing dynamic size allocations.  And did I mention, it's safe? :)

HOW DOES IT WORK?

* mmap, mprotect according to calculations above.
* There's basically a single SP at all times.
* Utilizes Rust's region system to ensure safety.  Allocating an element returns a Box with the
  lifetime of the region and bumps the pointer.  No checks needed :)
* (Potentially?) investigate a saner implementation of segmented stacks on SEGV.  "Saner" in this
  context: perhaps the stack switches directions and starts growing up, instead of down, and the
  old stack still sticks arond until you're done.  Kind of wasteful of memory but not slow.  Or
  maybe just do the usual inefficient segmented stacks implementation and treat it as a last
  resort.  Anyway, it's safe to just abort.
* Likely uses macros: would require a very powerful type system that allowed me to reason on nmbers
  to make it safe otherwise, since we have to know that e.g. min alignment is the same for all items.
  So it's not QUITE stack-like in that sense (this requirement could probably be avoided somehow though,
  I'll have to think about it.  Are stack frames aligned anyway?  I'd guess they at least might be).

HOPEFULLY this will provide speed and ergonomic improvements for the cases where I'm using TypedArenas
right now.


*/

/*pub struct Region {

}

pub struct ObStack<'a> {
    stack: [u8],
}*/

pub struct

fn foo<T, U>(x: T) -> U {
    unsafe { return ::std::mem::transmute::<T, U>(x) }
}

fn main() {
}
	#![feature(unsafe_destructor)]

	/*
	For safety, here are some guarantees we need fulfilled for allocated types for
	an allocator of size class <M, N, O>.

	* <T>::Size < M

	e.g. consider 1-sized struct aligned at 4096.

	And suppose we're at the very end of the page here (but it's aligned to 4096):

	[ 4096 bytes][4096 bytes]
	[1 byte]_____[4096 bytes]

	* assume alignment is a power of 2.

	* Size is integer multiple of alignment.

	MPROT \|4096 \|4096 \|

	PAGE \|4096 \|4096 \|4096 \|
	GUARD NO \|YES \|?
	ALLOC \|2048\|2048\|4096 \|
	WRITE \|1\|__\|1\|__\|_______\|1\|

	PAGE \|4096 \|4096 \|4096 \|
	GUARD NO \|YES \|?
	ALLOC \|2048\|4096 \|4096 \|
	WRITE \|1\|__\|1\|_______\|_______\|1\|

	PAGE \|4096\|4096 \|4096 \|
	GUARD NO \|YES \|?
	ALLOC \|1\|4096\|4096 \|
	WRITE \|1\|1\|__\|_____\|1\|

	* worst case: allocation completely clears the stack guard, can access memory
	anywhere.

	* worst case if allocations must be smaller than GUARD_SIZE and an allocation
	must always have at least one byte accessed on initialization:

	\|1 \|GUARD_SIZE - 1\|1\|GUARD_SIZE - 2\|1 \|
	STACK \|GUARD \|?
	\|ALLOC \|ALLOC \|
	\|WRITE\| \|WRITE\|

	* The easy fix: set MAX_ALLOC to 1. Then GUARD_SIZE - 2 in this worst-case
	would be negative, which is impossible :) This is not very practical, though.

	* The slightly less stupid but still easy fix: rewrite the scenario:

	\|1 \|ALLOC_SIZE - 1\|GUARD_SIZE + 1 - ALLOC_SIZE\|2 * (ALLOC_SIZE - 1) - GUARD_SIZE\|1 \|
	STACK \|GUARD \|?
	\|ALLOC \|ALLOC \|
	\|WRITE\| \|WRITE\|

	Then we require: ALLOC_SIZE - 1 <= GUARD_SIZE <= 2 * (ALLOC_SIZE - 1)

	So if ALLOC_SIZE = 0, GUARD_SIZE + 1 < ALLOC_SIZE, or 2 * (ALLOC_SIZE - 1) < GUARD_SIZE
	this can't happen. Since we can construct other bad scenarios if GUARD_SIZE + 1 < ALLOC_SIZE,
	and ALLOC_SIZE = 0 is useless, the logical requirement is 2 * (ALLOC_SIZE - 1) < GUARD_SIZE.
	i.e. GUARD_SIZE = 2 * ALLOC - 1.

	By taking alignment into account there may be / certainly are more optimizations than this that
	can be made, but this is a good starting point. And the optimizations may not be that important.

	Also note: practically speaking we'll want this to be a power of 2... force ALLOC to be a
	power-of-two and we can do GUARD_SIZE = ALLOC << 1.

	* The more complex fix: figure out how (and if) one can restrict alignment so
	this can never happen. Also: consider preferential alignment, reordering allocations, coalescing
	lookups (so we only have to bump the pointer once if you want n allocations), etc. Is there a way
	to expose this information to LLVM for optimization? That would be sweet.
	\|1\|2\|4\|8\|16\|32\|1\| \|
	\|1\| \| \|

	Effectively, can think of all allocations as being of either (align size) or
	(max alloc size) and writing to either (byte 0) or (byte max alloc size - 1)
	//and all alignments as being (relative to the stack guard) either
	// (STACK_GUARD -

	Guard page MUST be page aligned and allocations MUST be aligned at min-align,
	and all alignments MUST be powers of 2, so we have

	BASE + (REAL_SIZE) = GUARD_0
	BASE is (k-aligned):

	If k < PAGE_SIZE, then

	so we have
	//

	0 G


	ALLOC \|4096\|

	Key insight: with power of 2 alignments,

	You may not allocate a structure larger than the guard page size.

	What if the struct is aligned to 1?

	[4096 bytes][4096 bytes]
	____[1 byte][4096 bytes]

	Still a matter of not allocating a structure larger than the guard page size.

	What if alignment is 8?

	[4096 bytes][4096 bytes]
	[8 bytes]
	____[1 byte][4096 bytes]


	But you

	Next location is:
	(foo - 1) + bar
	4095 + 4096 = 8191
	e.g.
	So your guard pages must be at least
	* <T>::Alignment == N
	* <T>::
	*

	* The alignment must be the maximum of:
	(Align<T>, Size<T>)

	2
	1968

	aligned at 4


	MOTIVATION:

	Rust is great at managing regions! And Rust is also great at letting you
	stack-allocate memory! But when we put it on the heap, we lose a lot of the
	advantages of the stack:
	* Must allocate / deallocate (slowwwww).
	* Different sized types => one thing at a time, have to deal with size classes
	and runtime metadata.
	* Grab more than one of a type (in a vector) to avoid this cost =>
	must bounds check for safety.

	But when we put it on the stack, we lose a lot of the advantages of the heap:
	* Can't return references to things allocated locally.
	* Can use a TypedArena (similar to this), but then we can't box things since
	we can't destroy them, leading to overly long-lived regions.
	* TypedArenas are also kinda inefficient... can't use them for things in different
	size classes. And if you do it's WAY slower (think allocator, not stack),
	especially if you start to fragment.
	* No way to perform ~safe~ dynamically sized allocations.
	* If you don't use a stack guard, you have to pay for checking the SP in every
	function call.

	It would really be nice to have the convenience (from a lifetimes perspective) of allocating
	on the heap, but all the performance of allocating on the stack.

	INTRODUCING SECONDARY STACKS

	Secondary stacks try to address the above problems by creating a "second" stack that works
	almost exactly like a normal stack, with a few important differences:
	* NOT guaranteed to live in registers, etc. The stack is managed out of band. Similarly,
	LLVM probably won't be able to optimize it very well, so you'll likely have to do that
	yourself.
	* Frames are NOT automatically created and dropped at entry and exit points; this must be
	managed manually. The upside of this is that you can "dynamically allocate" your frame
	sizes if you so choose.
	* You can box things!
	* Frame entry and exit do not run destructors. The region system is relied upon to force
	the user to do this.
	* There are maximum limits on individual allocations (based on predefined user parameters)
	and these are enforced at compile time. This is to prevent jumping the stack guard (see
	above).
	* Can't reference allocated things as a fixed offset from the stack pointer (maybe you could
	if we added restrictions on dynamic allocations... maybe worth offering an alternative that
	works like that). So you still use the "real stack" most of the time.

	What we don't do: allocation or deallocation (except when the stack is created), require
	same-sized types, or bounds checks. Or make it impossible to return an allocated value.
	Or pay for dynamic runtime size classes. Or prevent you from destroying values early. Or
	prevent you from doing dynamic size allocations. And did I mention, it's safe? :)

	HOW DOES IT WORK?

	* mmap, mprotect according to calculations above.
	* There's basically a single SP at all times.
	* Utilizes Rust's region system to ensure safety. Allocating an element returns a Box with the
	lifetime of the region and bumps the pointer. No checks needed :)
	* (Potentially?) investigate a saner implementation of segmented stacks on SEGV. "Saner" in this
	context: perhaps the stack switches directions and starts growing up, instead of down, and the
	old stack still sticks arond until you're done. Kind of wasteful of memory but not slow. Or
	maybe just do the usual inefficient segmented stacks implementation and treat it as a last
	resort. Anyway, it's safe to just abort.
	* Likely uses macros: would require a very powerful type system that allowed me to reason on nmbers
	to make it safe otherwise, since we have to know that e.g. min alignment is the same for all items.
	So it's not QUITE stack-like in that sense (this requirement could probably be avoided somehow though,
	I'll have to think about it. Are stack frames aligned anyway? I'd guess they at least might be).

	HOPEFULLY this will provide speed and ergonomic improvements for the cases where I'm using TypedArenas
	right now.


	*/

	/*pub struct Region {

	}

	pub struct ObStack<'a> {
	stack: [u8],
	}*/

	pub struct

	fn foo<T, U>(x: T) -> U {
	unsafe { return ::std::mem::transmute::<T, U>(x) }
	}

	fn main() {
	}