Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Last active November 1, 2018 14:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nikomatsakis/74e541ec03e6b777040a8fb8ad6a2580 to your computer and use it in GitHub Desktop.
Save nikomatsakis/74e541ec03e6b777040a8fb8ad6a2580 to your computer and use it in GitHub Desktop.

I've been getting a lot of questions about the status of "Non-lexical lifetimes" (NLL) -- or, as I prefer to call it these days, the MIR-based borrow checker -- so I wanted to post a status update.

The single most important fact is that the MIR-based borrow check is feature complete and available on nightly. What this means is that the behavior of #![feature(nll)] is roughly what we intend to ship for "version 1", except that (a) the performance needs work and (b) we are still improving the diagnostics. (More on those points later.)

The MIR-based borrow check as currently implemented represents a huge step forward from the existing borrow checker, for two reasons. First, it eliminates a ton of borrow check errors, resulting in a much smoother compilation experience. Second, it has a lot less bugs. More on this point later too.

You may be wondering how this all relates to the "alias-based borrow check" that I outlined in [my previous post][pp], which we have since dubbed Polonius. We have implemented that analysis and solved the performance hurdles that it used to have, but it will still take some effort to get it fully ready to ship. The plan is to defer that work and ultimately ship Polonius as a second step: it will basically be a "MIR-based borrow check 2.0", offering even fewer errors.

[pp]: {{ site.baseurl }}/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/

Would you like to help?

If you'd like to be involved, we'd love to have you! The NLL working group hangs out on the #wg-nll stream in Zulip. We have weekly meetings on Tuesdays (3:30pm Eastern time) where we discuss the priorities for the week and try to dole out tasks. If that time doesn't work for you, you can of course pop in any time and communicate asynchronously. You can also always go look for work to do amongst [the list of GitHub issues].

Transition period

As I mentioned earlier, the MIR-based borrow checker fixes a lot of bugs -- this is largely a side effect of making the check operate over the MIR. This is great, but it does mean that we can't just "flip the switch" and enable the MIR-based borrow checker by default, since that would break existing crates (I don't really know how many yet). The plan therefore is to have a transition period.

During the transition period, we will issue warnings if your program used to compile with the old borrow checker but doesn't with the new checker (because we fixed a bug in the borrow check). The way we do this is to run both the old and the new borrow checker. If the new checker would report an error, we first check if the old check would also report an error. If so, we can issue the error as normal. If not, we issue only a warning, since that represents a case that used to compile but no longer does.

The good news is that while the MIR-based checker fixes a lot of bugs, it also accepts a lot more code. This lessens the overall impact. That is, there is a lot of code which ought to have gotten errors from the old borrow check (but never did), but most of that code won't get any errors at all under the new check. No harm, no foul. =)

Performance

One of the main things we are working on is the performance of the MIR-based checker, since enabling the MIR-based borrow checker currently implies significant overhead during compilation. Take a look at this chart, which plots rustc build times for the clap crate:

clap-rs performance

The black line ("clean") represents the "from scratch" build time with rustc today. The orange line ("nll") represents "from scratch" build times when NLL is enabled. (The other lines represent incremental build times in various combinations.) You can see we've come a long way, but there is still plenty of work to do.

The biggest problem at this point is that we effectively have to "re-run" the type check a second time on the MIR, in order to compute all the lifetimes. This means we are doing two type-checks, and that is expensive. However, this second type check can be significantly simpler than the original: most of the "heavy lifting" has been done. Moreover, there are lots of opportunities to cache work between them so that it only has to be done once. So I'm confident we'll make big strides here. (For example, I've got a PR up right now that adds some simple memoization for a 20% win, and I'm working on follow-ups that add much more aggressive memoization.)

(There is an interesting corolllary to this: after the transition period, the first type check will have no need to consider lifetimes at all, which I think means we should be able to make it run quite a bit faster as well, which should mean a shorter "time till first error" and also help things like computing autocompletion information for the RLS.)

Diagnostics

It's not enough to point out problems in the code, we also have to explain the error in an understandable way. We've put a lot of effort into our existing borrow checker's error message. In some cases, the MIR-based borrow checker actually does better here. It has access to more information, which means it can be more specific than the older checker. As an example1, consider this error that the old borrow checker gives:

error[E0597]: `json` does not live long enough
  --> src\main.rs:38:17
   |
38 |         let v = json["data"]["search"]["edges"].as_array();
   |                 ^^^^ borrowed value does not live long enough
...
52 |     }
   |     - `json` dropped here while still borrowed
...
90 | }
   | - borrowed value needs to live until here

The error isn't bad, but you'll note that while it says "borrowed value needs to live until here" it doesn't tell you why the borrowed value needs to live that long -- only that it does. Compare that to the new error you get from the same code:

error[E0597]: `json` does not live long enough
  --> src\main.rs:39:17
   |
39 |         let v = json["data"]["search"]["edges"].as_array();
   |                 ^^^^ borrowed value does not live long enough
...
53 |     }
   |     - borrowed value only lives until here
...
70 |             ", last_cursor))
   |                ----------- borrow later used here

The new error doesn't tell you "how long" the borrow must last, it points to a concrete use. That's great.

Other times, though, the errors from the new checker are not as good. This is particularly true when it comes to suggestions and tips for how to fix things. We've gone through all of our internal diagnostic tests and drawn up a list of about 37 issues, documenting each point where the checker's message is not as good as the old one, and we're working now on drilling through this list.

Polonius

In my [previous blog post][pp], I described a new version of the borrow check, which we have since dubbed Polonius. That analysis further improves on the MIR-based borrow check that is in Nightly now. The most significant improvement that Polonius brings has to do with "conditional returns". Consider this example:

fn foo<T>(vec: &mut Vec<T>) -> &T {
  let r = &vec[0];
  if some_condition(r) {
    return r;
  }
  
  // Question: can we mutate `vec` here? On Nightly,
  // you get an error, because a reference that is returned (like `r`)
  // is considered to be in scope until the end of the function,
  // even if that return only happens conditionally. Polonius can
  // accept this code.
  vec.push(...);
}

In this example, vec is borrowed to produce r, and r is then returned -- but only sometimes. In the MIR borrowck on nightly, this will give an error -- when r is returned, the borrow is forced to last until the end of foo, no matter what path we take. The Polonius analysis is more precise, and understands that, outside of the if, vec is no longer referenced by any live references.

We originally intended for NLL to accept examples like this: in the RFC, this was called Problem Case #3. However, we had to remove that support because it was simply killing compilation times, and there were also cases where it wasn't as precise as we wanted. Of course, some of you may recall that in my [previous post about Polonius][pp] I wrote:

...the performance has a long way to go ([Polonius] is currently slower than existing analysis).

I'm happy to report that this problem is basically solved. Despite the increased precision, the Polonius analysis is now easily as fast as the existing Nightly analysis, thanks some smarter encoding of the rules as well as the move to use datafrog. We've not done detailed comparisons, but I consider this problem essentially solved.

If you'd like, you can try Polonius today using the -Zpolonius switch to Nightly. However, keep in mind that this would be a 'pre-alpha' state: there are still some known bugs that we have not prioritized fixing and so forth.

Conclusion

The key take-aways here:

  • NLL is in a "feature complete" state on Nightly.
  • We are doing a focused push on diagnostics and performance, primarily.
  • Even once it ships, we can expect further improvements in the future, as we bring in the Polonius analysis.

Footnotes

  1. Hat tip to steveklabnik for providing this example!

Now that the final Rust 2018 Release Candidate has shipped, I thought it would be a good idea to do another update on the state of the MIR-based borrow check (aka NLL). The [last update][] was in June, when we were still hard at work on getting things to work.

[last update]: {{ site.baseurl }}/blog/2018/06/15/mir-based-borrow-check-nll-status-update/

Rust 2018 will use NLL now, Rust 2015 support coming soon

Let's get the highlights out of the way. Most importantly, Rust 2018 crates will use NLL by default. Once the Rust 2018 release candidate becomes stable, I hope to switch Rust 2015 crates to use NLL as well, but we're holding off until we have some more experience with people using it in the wild.

NLL is awesome

I've been using NLL in practice for quite some time now, and I can't imagine going back. Recently I've been working in my spare time on the salsa crate1, which uses Rust 2018, and I've really noticed how NLL makes a lot of "complex" borrowing interactions work out quite smoothly. These are all instances of the [problem cases #1 and #2] I highlighted way back when2, but they interact in interesting ways I did not fully anticipate.

Let me give you a hypothetical example. Imagine I am writing some bit of code that routes messages, which look like this:

enum Message {
    Letter { recipient: String, data: String },
    // ... other cases here ...
}

When I receive a letter, I want to inspect its recipient. If that matches my name, I will process the data using process:

fn process(data: &str) { .. }

but otherwise I'll forward it along to the next person in the chain. Using NLL, I can write this code like so (playground):

fn router(me: &str, rx: Receiver<Message>, tx: Sender<Message>) {
  for message in rx {
    match &message {
      Message::Letter { recipient, data } => {
        if recipient != me {
          tx.send(message).unwrap();
        } else {
          process(data);
        }
      }
    }
  }
}

What's interesting about this code is how uninteresting it is -- it basically just does what you expect, and didn't require any special action to please the borrow checker3. But before NLL, this would have required some significant contortions to achieve (try it yourself if you like -- that's a link to the same code, but with Rust 2015 edition set).

Diagnostics, migration, and performance

We've also put a lot of effort into NLL diagnostics and I think that by and large they are even better than the old borrow checker (which were already quite good). This is particularly true for the 'lifetime error messages'. Unfortunately, you won't see all of those improvements yet on Rust 2018 -- the reason has to do with migration.

What is this migration you ask? Well, it's our way of dealing with the fact that the new MIR-based borrow checker has fixed a ton of soundness bugs from the old checker. Unfortuantely, in practice, that means that some existing code will not compile anymore (because it never should have compiled in the first place!). To give people time to make that transition, we are running the NLL code in "migration mode", which means that if you have code that used to compile, but no longer does, we issue warnings instead of errors. This migration mode will eventually change to issue hard errors instead (probably in a few releases, but that depends a bit on what we find in the wild).

One downside of migration mode is that it requires keeping around the older code. In some cases, this older code can produce errors that wind up masking the newer, nicer errors that are produced by the MIR-based checker. The good news is that once we finish the migration, this means that errors will just get better.

Finally, those of you who read the previous posts may remember that the performance of the NLL checker was a big stumbling block. I'm happy to report that the performance issues were largely addressed: there remains some slight overhead to using NLL, but it is largely not noticeable in practice, and I expect we'll continue to improve it over time.

What next?

So, now that NLL is shipping, what is next for ownership and borrowing in Rust? That's a big question, and it has a few different answers, depending on the "scale" of time we are looking at. The immediate answer is that we've still got some bugs to nail down (small ones) and of course we expect that once more people start banging on the new code, they'll encounter new problems that have to be fixed. In addition, we've got to put some energy into writing up documentation for how the new checker works and similar things (we wound up deviating from the RFC analysis in various ways, and it'd be nice to document those).

In the medium term, the plan is to push more on the Polonius formulation of NLL that [I described here][alias-based]. In addition to offering a crisp formalization of our analysis, Polonius promises to fix the [Problem Case #3][pc3-link] that I identified in the original NLL introduction, along with some other cases where the current analysis falls short.

In the longer term, well, that's an open question, and one where I would like to here from you, dear reader. Over the next week or so, I am planning to write up a series of blog posts. Each will describe what I consider to be a common "tricky scenario" where people hit problems with the borrow checker, and none of which are solved by NLL. I'll also describe the current fixes required. Then I hope to do a survey, trying to get a picture of which of these challenges cause the most problems for folks, so that we can try to decide how to prioritize future improvements to Rust.

Footnotes

[alias-based]: {{ site.baseurl }}/blog/2018/04/27/an-alias-based-formulation-of-the-borrow-checker/ [pc3-link]: {{ sit.baseurl }}/blog/2016/04/27/non-lexical-lifetimes-introduction/#problem-case-3-conditional-control-flow-across-functions

Footnotes

  1. Did you see how smoothly I worked in that plug for salsa? I'll write a post about it soon, I promise.

  2. Note that the current NLL implementation does not solve Problem Case #3. See [the "What Next?" section][wn] for more. [wn]: #what-next

  3. Interestingly, I remember an example almost exactly like this being shown to me by a Servo intern -- I forget which one -- many years ago. At the time, it didn't seem like a big deal to do the workarounds, but I realize now I was wrong about that. Ah well.

In my previous post on the status of NLL, I promised to talk about "What is next?" for ownership and borrowing in Rust. I want to lay out the various limitations of Rust's ownership and borrowing system that I see, as well as -- where applicable -- current workarounds. I'm curious to get feedback on which problems affect folks the most.

The first limitation I wanted to focus on is interprocedural conflicts. In fact, I've covered a special case of this before -- where a closure conflicts with its creator function -- in my post on [Precise Closure Capture Clauses][cc]. But the problem is more general.

[cc]: {{ site.baseurl }}/blog/2018/04/24/rust-pattern-precise-closure-capture-clauses/

The problem

Oftentimes, it happens that we have a big struct that contains a number of fields, not all of which are used by all the methods. Consider a struct like this:

use std::sync::mpsc::Sender;

struct MyStruct {
  widgets: Vec<MyWidget>,
  counter: usize,
  listener: Sender<()>,
}

struct MyWidget { .. }

Perhaps we have a method increment which increments the counter each time some sort of event occurs. It also fires off a message to some listener to let them know.

impl MyStruct {
  fn signal_event(&mut self) {
    self.counter += 1;
    self.listener.send(()).unwrap();
  }
}

The problem arises when we try to invoke this method while we are simultaneously using some of the other fields of MyStruct. Suppose we are "checking" our widgets, and this process might generate the events we are counting; that might look like so:

impl MyStruct {
  fn check_widgets(&mut self) {
    for widget in &self.widgets {
      if widget.check() {
        self.signal_event();
      }
    }
  }
}  

Unfortunately, this code is going to yield a compilation error. The error I get presently is:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
  --> src/main.rs:26:17
     |
  24 |         for widget in &self.widgets {
     |                       -------------
     |                       |
     |                       immutable borrow occurs here
     |                       immutable borrow used here, in later iteration of loop
  25 |             if widget.check() {
  26 |                 self.signal_event();
     |                 ^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

What this message is trying to tell you1 is that:

  • During the loop, you are holding a borrow of self.widgets.
  • You are then giving away access to self in order to call signal_event.
    • The danger here is that signal_event may mutate self.widgets, which you are currently iterating over.

Now, you and I know that signal_event is not going to touch the self.widgets field, so there should be no problem here. But the compiler doesn't know that, because it only examines one function at a time.

Inlining as a possible fix

The simplest way to fix this problem is to modify check_widgets to inline the body of signal_event:

impl MyStruct {
  fn check_widgets(&mut self) {
    for widget in &self.widgets {
      if widget.check() {
        // Inline `self.signal_event()`:
        self.counter += 1;
        self.listener.send(()).unwrap(); 
      }
    }
  }
}  

Now the compiler can clearly see that distinct fields of self are being used, so everything is hunky dory. Of course, now we've created a "DRY"-failure -- we have two bits of code that know how to signal an event, and they could easily fall out of sync.

Factoring as a possible fix

One way to address the DRY failure is to factor our types better. For example, perhaps we can extract a EventSignal type and move the signal_event method there:

struct EventSignal {
  counter: usize,
  listener: Sender<()>,
}

impl EventSignal {
  fn signal_event(&mut self) {
    self.counter += 1;
    self.listener.send(()).unwrap();
  }
}

Now we can modify the MyStruct type to embed an EventSignal:

struct MyStruct {
  widgets: Vec<MyWidget>,
  signal: EventSignal,
}

Finally, instead of writing self.signal_event(), we will write self.signal.signal_event():

impl MyStruct {
  fn check_widgets(&mut self) {
    for widget in &mut self.widgets {
      if widget.update() {
        self.signal.signal_event(); // <-- Changed
      }
    }
  }
}  

This code compiles fine, since the compiler now sees access to two distinct fields: widgets and signal. Moreover, we can invoke self.signal.signal_event() from as many places as we want without duplication.

Truth be told, factoring sometimes makes for cleaner code: e.g., in this case, there was a kind of "mini type" hiding within MyStruct, and it's nice that we can extract it. But definitely not always. It can be more verbose, and I sometimes find that it makes things more opaque, simply because there are now just more structs running around that I have to look at. Some things are so simple that the complexity of having a struct outweights the win of isolating a distinct bit of functionality.

The other problem with factoring is that it doesn't always work: sometimes we have methods that each use a specific set of fields, but those fields don't factor nicely. For example, if we return to our original MyStruct (where everything was inlined), perhaps we might have a method that used both self.counter and self.widgets but not self.listener -- the factoring we did can't help us identify a function that uses counter but not listener.

Free variables as a general, but extreme solution

One very general way to sidestep our problem is to move things out of method form and into a "free function". The idea is that instead of &mut self, you will take a separate &mut parameter for each field that you use. So signal_event might look like:

fn signal_event(counter: &mut usize, listener: &Sender<()>) {
  *counter += 1;
  listener.send(()).unwrap();
}

Then we would replace self.signal_event() with:

signal_event(&mut self.counter, &self.listener)

Obviously, this is a significant ergonomic regression. However, it is very effective at exposing the set of fields that will be accessed to our caller.

Moving to a free function also gives us some extra flexibility. You may have noted, for example, that the signal_event function takes a &Sender<()> and not a &mut Sender<()>. This is because the send method on Sender only requires &self, so a shared borrow is all we need. This means that we could invoke signal_event in some location where we needed another shared borrow of self.listener (perhaps another method or function).

View structs as a general, but extreme solution

I find moving to a free function to be ok in a pinch, but it's pretty annoying if you have a lot of fields, or if the method you are converting calls other methods (in which case you need to identify the transitive set of fields). There is another technique I have used from time to time, though it's fairly heavy weight. The idea is to define a "view struct" which has all the same fields as the orignal, but it uses references to identify if those fields are used in a "shared" (immutable) or "mutable" way.

For example, we might define CheckWidgetsView

struct CheckWidgetsView<'me> {
  widgets: &'me Vec<MyWidget>,
  counter: &'me mut usize,
  listener: &'me mut Sender<()>,
}

Now we can define methods on the view without a problem:

impl<'me> CheckWidgetsView<'me>  {
  fn signal_event(&mut self) {
    *self.counter += 1;
    self.listener.send(()).unwrap();
  }

  fn check_widgets(&mut self) {
    for widget in &self.widgets {
      if widget.check() {
        self.signal_event();
      }
    }
  }
}  

You might wonder why this solved the problem. After all, the check_widgets method here basically looks the same -- the compiler still sees two overlapping borrows:

  • a shared borrow of self.widgets, in the for loop
  • a mutable borrow of self, when invoking signal_event

The difference here lies in the type of self.widgets: because it is a &Vec<MyWidget>, we already know that the vector we are iterating over cannot change -- that is, we are not giving away mutable access to the iterator itself, just to a reference to the iterator. So there is nothing that signal_event could do to mess up our iteration.

(Note that if we needed to mutate the widgets as we iterated, this "view struct" trick would not work here, and we'd be back where we started -- or rather, we'd need a new view struct just for signal_event.)

One nice thing about view structs is that we can have more than one, and we can change the set of fields that each part refers to. So, for example, one sometimes has "double buffering"-like algorithms that use one field for input and one field for output, but which field is used alternates depending on the phase (and which perhaps use other fields in a shared capacity). Using view struct(s) can handle this quite elegantly.

Relation to closures

As I mentioned, one common place where this problem arises is actually with closures. This occurs because closures always capture entire local variables; so if a closure only uses some particular field of a local, it can create an unnecessary conflict. For example:

fn check_widgets(&mut self) {
  // Make a closure that uses `self.counter`
  // and `self.listener`; but it will actually
  // capture all of `self`.
  let signal_event = || {
    self.counter += 1;
    self.listener.send(()).unwrap(); 
  };
  
  for widget in &self.widgets {
    if widget.check() {
      signal_event();
    }
  }
}

Even though it's an instance of the same general problem, it's worth calling out specially, because it can be solved in different ways. In fact, we've accepted RFC #2229, which proposes to change the closure desugaring. In this case, the closure would only capture self.counter and self.listener, avoiding the problem.

Extending the language to solve this problem

There has been discussion on and off about how to solve this problem. Clearly, there is a need to permit methods to expose information about which fields they access and how they access those fields, but it's not clear what's the best way to do this. There are a number of tradeoffs at play:

  • Adding more concepts to the surface language.
  • Core complexity; this probably involves extending the base borrow checker rules.
  • Annotation burden.
  • Semver considerations (see below).

There is some discussion of the view idea in this internals thread; I've also tinkered with the idea of merging views and traits, as described in this internals post. I've also toyed with the idea of trying to infer some of this information for private functions (or perhaps even crate-private functions), but I think it'd be best to start with some form of explicit syntax.

Semver considerations. One of the things you'll notice about all of the solutions to the problem is that they are all ways of exposing information to the compiler about which fields will be used in signal_event or (in the case of view structs) how they will be used. This has semver implications: imagine you have a public function fn get(&self) -> &Foo that returns a reference to something in self. If we now permit your clients to invoke other methods while that borrow is live (because we know somehow that they won't interfere), that is a semver commitment. The current version, where your struct is considered an atomic unit, gives you maximal freedom to change your implementation in the future, because it is maximally conservative with respect to what your clients can do.

Conclusion

The general problem here I think is being able to identify which fields are used by a method (or set of methods) and how. I've shown a number of workarounds you can use today. I'm interested to hear, however, how often this problem affects you, and which (if any) of the workarounds might have helped you. (As noted, I would break out closures into their own subcategory of this problem.)

Footnotes

  1. This message is actually a bit confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment