Skip to content

Instantly share code, notes, and snippets.

@joliss
Last active March 27, 2017 17:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joliss/e89226b4c37dedb8a15676fbeb016932 to your computer and use it in GitHub Desktop.
Save joliss/e89226b4c37dedb8a15676fbeb016932 to your computer and use it in GitHub Desktop.

How about having "parent objects" instead of "lifetimes"?

Language design question: I wonder if we can replace or supplant the notion of "lifetimes" in languages like Rust with a notion of "parent objects" (or "context objects").

In Rust (and most languages), iterators into vectors are (ptr, end) pairs. Having a notion of "parent" objects allows us to make them plain ptr values, and additionally allows for safely appending to vectors while iterating.

Some background first. In Rust, iterators into vectors (technically, into slices) are internally represented as (ptr, end) pairs. Thus if we create an iterator,

let mut i = vec.iter();

the compiler copies vec.end into i.end. This is a strange duplication of data -- in a hand-written loop in C, we would just use a single pointer as our iterator, and we wouldn't need to copy vec.end.

But storing i.end is technically necessary, because we could reassign

if cond { i = vec2.iter(); }

and now i needs to remember which vector it came from -- we don't know at compile time if iteration ends at vec.end or vec2.end.

Of course, if we don't have any reassignment like this, the compiler might elide the i.end pointer opportunistically. But annoyingly, this isn't guaranteed.

Rust typing logic seems to be roughly: if i's lifetime is 'a, we can assign an iterator from any object whose lifetime 'b outlives 'a. But in practice, when we pass iterators around we usually know precisely what vector they came from. So if we were to design a language from scratch, perhaps instead of saying "i is an iterator with lifetime 'a", we'd want to say "i is an iterator derived from vec [which had better outlive i]". This would imply that we can reassign i to be any iterator into vec, but we can no longer reassign it to be an iterator into vec2.

In other words, right now i's type is Iter<'a, T: 'a>, but perhaps we could make a language where i's type contains a reference to a specific object, like Iter<vec: Vec> (with T=vec::Item). Then we can change the implementation for Iter::is_empty() from

self.ptr == self.end // use .end field on instance

to

// statically get vec from our type, and pull .end from it
self.ptr == self::vec.end

Three things motivate this (beyond saving a few bytes for the iter.end pointer):

  1. Lifetime error messages are sometimes hard to read. I suspect that this is because lifetimes are not usually first-class objects in our minds. Saying "i is a child of vec" might be easier to understand than making an equivalent statement about lifetimes.

  2. I see an antipattern (happening in most languages) of data structures containing redundant parent references that need to be kept consistent. E.g. I might write

     struct Blog { posts: Vec<Post> }
     struct Post {
       blog: &Blog // parent reference
       // ... fields ...
     }
    

    just so I can pass around &Post references and still get at the containing Blog. Ideally the compiler could handle this for us: When the parent blog is known at compile time, we don't need the parent reference. When it isn't known at compile time, we might (perhaps implicitly) represent &Post references as not a single pointer, but as a pair of pointers to Blog and Post, thereby making the &blog parent reference external to the Post structure.

  3. The idea of "parent" and "child" objects also opens the door to allowing pushing into vectors while iterating.

    To recap the problem, say you want to write:

     for &item in vec {
       if (cond) {
         vec.push(...);
       }
     }
    

    Recall that vectors are (ptr, length, capacity) tuples, and that vec.push() might reallocate the vector when there is no remaining capacity. When this happens, the for loop's iterator into the vec becomes invalid. Rust will therefore reject the code above, while C++ will compile but segfault.

    But perhaps we can make it work!

    In the code example above, an iterator can get its parent vector. Let's imagine that conversely, a vector might be able to get its children iterators. Then push() can update all outstanding iterators whenever it reallocates. So here's how we might define vec.push():

     #[inline]
     fn push(&mut self, value: T) {
       if out_of_capacity {
         // ... reallocate for more capacity ...
    
         for iter in &mut self::outstanding_iterators {
           // Update each iter to point into the new memory block!
           iter.ptr += (new_ptr - old_ptr)
         }
       }
       // ... append value...
     }
    

    We might want to ensure that self::outstanding_iterators is known at compile time, so the loop can always be elided or unrolled.

So what are people's thoughts on this? Has this kind of "parent" thing been done? Is it doable with any existing language? It seems like it heavily overlaps with lifetime logic, so perhaps people who have designed lifetime-based languages like Rust have some thoughts about the tradeoffs here?

@joliss
Copy link
Author

joliss commented Mar 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment