Skip to content

Instantly share code, notes, and snippets.

@nrc
Last active January 2, 2022 02:03
Show Gist options
  • Save nrc/809614adb2bbb38232b7 to your computer and use it in GitHub Desktop.
Save nrc/809614adb2bbb38232b7 to your computer and use it in GitHub Desktop.
Borrowing in Rust

When converting from a smart pointer type to a borrowed reference, you need to 'cross-borrow' data. This requires writing &*expr or &**expr, etc. This is usually just annoying - it doesn't help to read or write the code. When writing, you play 'type Tetris', inserting *s until the compiler is happy. When reading it is just line noise. At best, if you are passing a value to a function, it tells you that you are passing a borrowed reference, not owned data. Alternatively, that you are passing by reference, not value.

So, we would like to spare the programmer this burden. The question is how. There have been a few proposals (rust-lang/rfcs#226, rust-lang/rfcs#241, rust-lang/rfcs#248). A central question when evaluating these proposals is whether the programmer should care first about the borrowed-ness of data or about the amount of indirection to the user. I am starting to lean more towards the position of making ownership/borrowing superior here. Certainly it has the advantage that it respects the substitutability principle with respect to the dynamic semantics due to borrowing. However, I am still a little bit sympathetic to the 'treating all pointers alike' principle, since this also respects substitutability with respect to an abstract dynamic semantics, although not the borrow checker.

Anyway, that is a hard question, so I went hunting for some data to answer some easier questions. I analysed the Rust repo (compiler and libs) and Servo. You can find the raw data here (https://dl.dropboxusercontent.com/u/74741329/borrow-data.ods). First some trivia:

       | Rust compiler | std + core | other libs | Servo

-----------|---------------|------------|------------|------- & operator | 20811 | 1845 | 11339 | 11685

  • operator | 15294 | 4022 | 4327 | 10720 . operator | 58074 | 8227 | 16528 | 36570

These just give us a sense of scale really.

How about uses of &* and &**? These are the kind of line noise operators we hope to forget about with the proposals:

| Rust compiler | std + core | other libs | Servo

----|---------------|------------|------------|------ &* | 2046 | 234 | 630 | 1536 &** | 1116 | 14 | 23 | 28

There are twice as many uses of &* than of &** in the compiler and only negligible uses of &** elsewhere. RFC PR #226 would change &*expr to expr and &**expr to *expr. RFC PR #241 would change most instances of &**expr to expr.

To get more fine grained lets look at how many of these operations could be further dereferenced. That is we found &*expr but could have taken &**expr and so forth, lets also look at address-of without a deref:

| Rust compiler | std + core | other libs | Servo

----|---------------|------------|------------|------ & | 737 | 82 | 109 | 209 &* | 9 | 0 | 0 | 26 &** | 0 | 3 | 0 | 0

Looking at &* and &**, this is basically a negligible amount. RFC PR #241 would make these &*expr into expr (probably, I don't actually have data for which of these are borrows vs other deref-able types). Other &*exprs would become &expr. From this and the above, we see that #226 would have a much bigger effect on the look and feel of Rust programs (for better or worse). However, the fact that it still requires some annotation where we currently write &** is significant, at least for the compiler.

Although the numbers for & look large, they must be taken in the context of the absolute numbers of address-of operations - 1.5-4.4%. This data is interesting in the context of the borrow operator proposal (RFC PR #248) - these are the number of uses which would require addr() instead of the proposed & operator.

I also counted the number of function/method arguments which are reference types of a type which could be dereferenced. Since we have a known type (the formal parameter type) and the sub-expression can be further dereferenced, these are situations where coercions (#241) are better than the borrow operator (#248). These numbers are slightly different from the ones in the first row of the above table since they are looking at reference types, not just the address-of operator.

| Rust compiler | std + core | other libs | Servo

----|---------------|------------|------------|------ | 880 | 91 | 183 | 392

One thought I had was that we could simplify the coercion in #241 by always deref'ing as much as possible, rather than checking if we have the right type after each deref. These numbers represent situations where that strategy would not work (assuming we would coerce in these situations). I suspect these numbers are large enough to indicate the strategy in the RFC is better, but I'm not 100% sure how to interpret them.

Finally, I looked at the number of method calls which could be made either without coercion or by doing a maximal deref and a single address-of. (I.e., the strategy from #248's borrow operator, rather than todays more flexible coercions). I found that 25-25% of calls did not fall into the simple pattern, and so we probably do want to stick to the current scheme (unless there is a significant use of box self, but I don't think there is).

Conclusions

I don't think these data help us choose between the two coercion proposals - the difference is more of a philosophical question. They do give an indication of the difference they will make, but this needs to be traded off against whether it is the right difference to make.

I think the numbers do show some of my ideas for more predictable coercions for receivers and in general (in the context of #241) won't fly.

The numbers in the third table do show that changing & to a borrow operator is not insane. However, the fourth table shows there are a small, but significant number of places where a coercion is better than a borrow operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment