public
Last active

  • Download Gist
smartpointersugarv2.md
Markdown

Custom Smartpointer Sugar (v2)

Reading the Meeting notes at https://github.com/mozilla/rust/wiki/Meeting-2013-06-07, it seems that there is general agreement with the idea to move @T and @mut T from build-in constructs to library types, provided it's possible to keep the same semantic with the help of lang items and traits.

This proposal is about the '@ vs Gc<>' issue mentioned at the end of those notes, to which I say:

"Why not both?"

To elaborate, I propose that the current 'hardcoded-to-gc-@'-ptr syntax disappears, and that the @-sigil gets re-purposed as special sugar for (custom) library types:

let g: @Gc int  = @Gc 5;            // Immutable tracing-collected box
let r: @Rc char = @Rc '5';          // Immutable reference-counted box
let m: @GcMut T = @GcMut foo();     // Mutable tracing-collected box

let mut ar      = Arena::new(1024); // Create a memory pool
let t: @Ar T    = @(ar)Ar bar();    // Immutable Arena allocation in `ar`

Compared to how it could be written today:

let g: @int = @5;       
let r: Rc<char> = Rc::new(|| '5'); // I'll talk about why that should take a closure below
let m: @mut T = @mut foo(); 

let mut ar = Arena::new(1024);        
let t: Ar<T> = ar.allocate(|| bar());      

Compared to just going with ad-hoc P<T> forms it would offer many benefits:

  • All custom Smartpointers become first-class citizen in usage.
  • With higher kinded types, generics could become generic over allocations: fn foo<T, A: Allocation>(@A T){ ... }
  • Syntactic similar type-, construction- and pattern-matching form, like for the build-in structural types.
  • It's a syntactic distinct construct to hang semantic specialities like auto-borrowing, dynamic freezing, etc from.
  • It visually separates types with memory management and memory allocation semantics from other types.
  • Lightweight syntax without type-param nesting hell (especially for the expression form).
  • Because a @T has a associated sigil, it 'looks like a memory-thing' (In the sense that & and ~ are also sigils associated with specific memory semantics).
  • But more important: Consistency with the type and construction syntax for & and ~.
  • The syntax for the build-in Gc Smartpointers becomes slightly heavier (@ -> @Gc), thus discouraging its use unless necessary.
  • But if necessary, the syntax is still easier to decipher and use than Gc<T>, especially if T is some generic type with a few layers of nesting.

Problems

There are a few syntactic ambiguities to be careful about:

First of all, some trickery in required to make paths work for @P T. Just having P and T expand to a path leads to the problem of deciding where one path ends and the other starts:

@::std::Gc ::foo::bar::MyType
@::std::Gc::foo ::bar::MyType
@::std::Gc::foo::bar ::MyType

This could be resolved by making the @-sigil not a prefix of the path, but of the type at the end of it, like this:

::std::@Gc ::foo::bar::MyType 

here the @-sigil acts as a anchor for the end of the first path.

Another problem exists with in-object allocation: There needs to be a way to cram an optional second expression in the expression form for accessing the object-to-allocate-in. I first considered

@<ident>(<expr>) <expr>  -  Example: @Foo(bar) baz()

But this (I think?) has the problem that (<expr>) is itself a valid expression, so there is ambiguity after the <ident>.

What I think would work is this:

@(<expr>)<ident> <expr>  -  Example: @(bar)Foo baz()

which would be unambiguous because '(' is not a valid start of an identifier. However, I'm unhappy with that form because it breaks the unit of @Ptr. Maybe another form could be found that's just as concise? Here are a few possible candidates:

@(<ident>, <expr>) <expr>   - @(Ar, ar) foo()   // Only jams a '(' between type and sigil.
@<ident> <expr> in <expr>   - @Ar foo() in ar   // Needs a keyword
@<ident> from <expr> <expr> - @Ar from ar foo() // Needs a keyword
<expr>.@<ident> <expr>      - ar.@Ar foo()      // Allocation as method call on the object
@<ident>:<expr> <expr>      - @Ar:ar foo()      // ':' is usually a type-value separator in rust

Implementation

Why a closure for initialisation?

In my examples above I used Foo::new(|| bar() ) for initializing an allocation with the result of an expression, rather than Foo::new( bar() ). The reason for that is that the latter form would always do an shallow copy of the expression, which is inefficient for large structs. However, with a cheap stack closure the result of the expression can be optimised to write directly to the target memory address, turning f = || foo(); ... *ptr = f(); into a 'guaranteed-write'-version of f = |ptr| {*ptr = foo()}; ... f(ptr);.

According to graydon, this optimisation can be guaranteed to happen with a bit of work.

Traits

The sugar would probably be implemented with the help of a few traits:

  • An Allocation-trait, that represents an handle to some allocated memory - basically todays @T, Rc<T>, ...-smartpointer. Types implementing it could then be augmented with more trait impls for enabling things like borrows etc.
  • Allocator-traits for performing the allocation. Would need two different versions, one with a static allocation function, and one where there allocation is a method on &mut self

Example definitions:

#[lang="allocator"]
trait Allocator<T, U: Allocation<T>> {
    fn alloc(v: &once fn() -> T) -> U; 
}

#[lang="object_allocator"]
trait ObjectAllocator<T, U: Allocation<T>> {
    fn alloc(&mut self, v: &once fn() -> T) -> U; 
}

#[lang="allocation"]
trait Allocation<T> { }

#[lang="borrowable_allocation"]
trait BorrowableAllocation<T>: Allocation<T> {
    fn borrow<U>(&self, f: &once fn(&T) -> U) -> U; 
}

#[lang="mut_borrowable_allocation"]
trait MutableBorrowableAllocation<T>: Allocation<T> {
    fn borrow_mut<U>(&self, f: &once fn(&mut T) -> U) -> U; 
}

Types like Gc<T>, RcMut<T> would implement both Allocation and Allocator, while something like an Arena would separate them out in two types: The memory container (Allocator) and the handles to it (Allocation).

(The question is whether borrow() and borrow_mut() should take &self or &mut self. Some type might mutate self for borrowing, some might not. Just always making it &mut self would mean a smartpointer would need to be mutable for borrowing. For comparison, the current Rc and RcMut use &self and mutate a box behind a unsafe ptr for setting flags.)

If you then use custom smartpointers in your code like for example this:

let foo = @Gc 8;
let b: &int = foo;
print(fmt!("%?", b)); 

// Lifetime of b ends here

let foo = @GcMut (*foo).clone();
bar(&*foo);
baz(&mut *foo);

// ...

rustc desugars and rewrites the code to work something like this:

let foo = Allocator::alloc::<int, Gc<int>>( || 8 );

do foo.borrow |r| {
    let b: &int = r;
    print(fmt!("%?", b));
    // Lifetime of b ends here
}

let _tmp = do foo.borrow |r| {
    (*r).clone()
};

let foo = Allocator::alloc::<int, GcMut<int>>( || _tmp );

do foo.borrow |r| {
    bar(r);
}

do foo.borrow_mut |r| {
    baz(r);
}

// ...

(no guarantees for correctness and feasibility here, it's just intended as an example)

Omissions

Dynamic sized types

One thing this proposal doesn't talk about is how to support dynamic sized types like [T]. It would require additional allocation functions that take an size parameter, and a safe way to use them. I haven't yet found an satisfying solution for that, but if I find one I will include it in this document.

Borrowed references

In-object-allocations might need additional lifetime parameters to chain their lifetimes to their allocator. This could be done with a general lifetime parameter on Allocation: Unbounded allocations like Gc would get the static lifetime, bounded ones the lifetime of their allocator.

Transition

Transitioning to this syntax from the current one could be done in one of at least two ways:

A)

  1. Deprecate @T, introduce Gc<T>.
  2. Remove @T completely.
  3. Introduce @S T as optional syntax.
  4. Make @S T the only allowed form for Smartpointers.

B)

  1. In one operation, change the parser from @T to @S T, ignoring the S token, and replace all @T with @Gc T.
  2. Slowly build up the general support for the new syntax from behind the scenes.

I agree that we need a pointer trait for overloading the borrowing/dereferencing syntax for custom pointer-like types, but I think they should just be defined as follows:

/// Implemented for unique pointers and immutable shared pointers
#[lang = "pointer"]
trait Pointer<T> {
    fn borrow(&self) -> &'r T;
}

/// Implemented for mutable unique pointers
#[lang = "mutable_pointer"]
trait MutablePointer<T> {
    fn borrow_mut(&mut self) -> &'r mut T;
}

I don't think dynamic freezing should be included as syntactic sugar in the language. The failure cases are usually unpredictable and hard to reason about even after an error is throw and they've proven to be a major source of confusion for people learning the language. An @mut pointer can just be passed around by-value, so I don't think there's any need to make the fragile borrows pretty. It should just be exposed in a standard library module with methods taking closures, to make it a very explicit opt-in (#7140).

Reading and writing to fields in an @mut box isn't treated as a borrow, so that does need to be made generic because it's currently unavailable for RcMut.

I'm against adding special syntax for smart pointer types because making the type grammar more complex isn't going to make it easier to understand. The @ syntax might feel familiar to the existing Rust community, but it would be less familiar to newcomers from other languages - especially C++.

C++ has an allocator concept but it's often regarded as one of the mistakes in the language. It has essentially no use case, because a process should really only have one general purpose allocator. An allocator like jemalloc already uses thread-local arenas with size classes tightly packed together.

Specialized allocators and arenas occasionally make sense, but the key point is that they're specialized and don't fulfil the requirements of a general purpose allocator trait. If they weren't, they would be no better than the general purpose allocator. Mixing allocators is an opt-in to memory fragmentation, so it's not something that should be used lightly.

Another issue is that making reference counting a drop-in replacement for garbage collection means removing the implicit copying from garbage collected types. The performance cost from not having move semantics for reference counted types is huge and it's the biggest issue with the current implementation of @T.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.