Skip to content

Instantly share code, notes, and snippets.

@davidhewitt
Last active October 25, 2023 13:08
Show Gist options
  • Save davidhewitt/d0ed031fb05f6db98ee249ae089b268e to your computer and use it in GitHub Desktop.
Save davidhewitt/d0ed031fb05f6db98ee249ae089b268e to your computer and use it in GitHub Desktop.
Dreaming of arbitrary self types for PyO3
//! The following is a simplified form of a possible PyO3 API which shows
//! cases where arbitrary self types would help resolve papercuts.
// ----------------------------------------------------------------------------------
//
// Case 1 - PyO3's object hierarchy. We have a smart pointer type Py<T> and want to
// use it as a receiver for Python method calls.
//
//
/// Python's C API is wrapped by `pyo3-ffi` crate, also exported as `pyo3::ffi`
/// submodule.
mod ffi {
extern {
/// A Python object. For this model we don't care about it's contents, so we
/// just use unstable "extern type" syntax to name it.
type PyObject;
}
}
/// A smart pointer to a Python object, which is reference counted. A good enough
/// description is that it is approximately an `Arc<T>` where the memory is
/// stored on the Python heap and reference counting is synchronized by the
/// Python GIL (Global Interpreter Lock).
///
/// Here in this model we ignore the existence of the Python GIL as it is just a
/// distraction. In PyO3's real API we have a lifetime `'py` on several types to
/// model this
struct Py<T>(NonNull<ffi::PyObject>);
// -- Some zero-sized types to describe Python's object hierarchy. --
/// Any Python object.
struct PyAny(());
/// A concrete subtype, a Python list.
struct PyList(());
// -- Implementations of methods on these types --
// In practice these methods return results, we'll ignore that here.
impl PyAny {
/// Get an attribute on this object. In Python syntax this is `self.name`.
///
/// Receiver is &Py<PyAny> - arbitrary self type!
fn getattr(self: &Py<PyAny>, name: &str) -> Py<PyAny> { /* ... */ }
}
impl PyList {
/// Get an element from this list. In Python syntax this is `self[idx]`.
///
/// Receiver is &Py<PyList> - arbitrary self type!
fn get_item(self: &Py<PyList>, idx: usize) -> Py<PyAny> { /* ... */ }
}
// In addition, we want to call `getattr` with a `Py<PyList>`, because this is
// a valid operation too. The cleanest way to do this is with `Deref`:
impl Deref for Py<PyList> {
type Target = Py<PyAny>;
fn deref(&self) -> &Py<PyAny> { /* ... */ }
}
// ... but if arbitrary self types is tied to Deref, instead we have to have
impl Deref for Py<PyList> {
type Target = PyList;
fn deref(&self) -> &PyList { /* ... */ }
}
// We could find other ways to make Py<PyList> have a getattr method without
// `Deref`, e.g. by moving all of `PyAny` methods onto a trait and implementing
// it for `Py<PyAny>`, `Py<PyList>` and so on. This leads to a lot of repetition;
// N trait implementations for N concrete types PyAny, PyList, etc.
// Also the `&PyList` reference on its own is useless, so `Deref<Target = PyList>`
// is a little weird.
// ----------------------------------------------------------------------------------
//
// Case 2 - PyO3's "refcell" container synchronized by the GIL. This has a close
// cousin in `std::cell::RefCell`.
//
//
/// PyO3 has a `#[pyclass]` macro which generates a Python type for a Rust
/// struct.
/// - `Foo` continues to be the plain old Rust struct.
/// - `Py<Foo>` is a smart pointer to a Python object which contains a `Foo`.
#[pyclass]
struct Foo { /* ... */ }
/// To implement methods on the Python type PyO3 has a `#[pymethods]` macro.
///
/// Users can use `&self` and `&mut self` receivers. To make this possible,
/// `Py<Foo>` like `RefCell<Foo>` but uses the Python GIL for synchronization.
/// `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` are the guards to `Py<Foo>`.
impl Foo {
/// Receive by `&self``, read only the Rust data. Possible today.
fn a(&self) { /* ... */ }
/// Receive by `&mut self`, read or write only the Rust data. Possible today.
fn b(&mut self) { /* ... */ }
/// Receive by `Py<Foo>`. `Py<Foo>` implements `Deref<Target = Py<PyAny>>`
/// so that all Python operations are accessible.
///
/// This is an arbitrary self type.
///
/// Current users of PyO3 have to use `slf: Py<Foo>` which is awkward
/// and also loses method call syntax.
fn c(self: Py<Foo>) { /* ... */ }
/// Receive by `PyRef<'_, Foo>`. `PyRef<'_, Foo>` is a pointer to the Python
/// data. It implements `Deref<Target = Foo>` to give read access to the Rust
/// data.
///
/// This is an arbitrary self type.
///
/// Same workarounds for current users of PyO3 apply.
fn d(self: PyRef<'_, Foo>) { /* ... */ }
/// Receive by `PyRefMut<'_, Foo>`. `PyRefMut<'_, Foo>` is a pointer to the Python
/// data. It implements `DerefMut<Target = Foo>` to give read and write access to
/// the Rust data.
///
/// This is an arbitrary self type.
///
/// Same workarounds for current users of PyO3 apply.
fn e(self: PyRefMut<'_, Foo>) { /* ... */ }
}
// Note that in the above, `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` both implement
// `Deref<Target = Foo>` so would fit fine with deref-based arbitrary self types.
//
// However `Py<Foo>` cannot implement `Deref<Target = Foo>`, just like how `RefCell<T>`
// cannot implement `Deref<Target = T>`.
//
// To make `Py<Foo>` be able to implement `Deref`, we must give up its refcell-like
// feature. This removes `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>`, and it also
// removes the ability to have `&mut self` as a receiver. The mutable access
// needs the runtime refcell protection due to Python code being incompatible with
// the borrow checker.
//
// There is a possible argument that removing `&mut self` and refcell feature is
// a good thing, but it is also _extremely_ ergonomic for users. We could have
// a long conversation about whether PyO3 made the wrong API choice here. There is
// `#[pyclass(frozen)]` which opts-in to this restriction, so by flipping the default
// and then removing the option we could evolve PyO3's API over time if we think
// deref-based arbitrary self types is the correct formulation of arbitrary self types.
//
// If you feel like a long distraction, we can discuss how Python might
// be removing the GIL, and how that means that PyO3 might be forced to change
// anyway.
@adetaylor
Copy link

adetaylor commented Oct 15, 2023

Thanks for taking the time to document this so thoroughly! It's extremely clear and I think the first use-case is a good reason why we should not have a blanket implementation of Receiver for Deref. We hadn't previously had any such use-cases, so the blanket implementation was aimed as an overall confusion-reduction measure, to ensure that chains of derefs would never head off in a different direction from chains of receivers. But, now we have actual use-cases, we should accept the potential higher amount of user confusion.

(Also blanket impls are not really the Rust way so it was always going to be a hard sell to the community! I'm somewhat relieved there's now good uses-cases to avoid it)

(I don't think the second use-case is a reason against a blanket impl: nothing is stopping you implementing only Receiver for Py<T>, even if there's a blanket implementation of Receiver for Deref)

@davidhewitt
Copy link
Author

You're welcome, glad it's helpful! If you'd like any follow-up information, please just ping! And thank you for working on arbitrary self types, I'm optimistic for a more powerful Rust in the future which can easily express PyO3's objects 🙏

@adetaylor
Copy link

@davidhewitt Hello. Supposing we have separate Deref and Receiver traits as you'd like us to have.

I'm still a little concerned it doesn't quite meet your needs.

let list: Py<PyList> = ...;
list.get_item(3); // so far so good.

// But how are we going to call this?
// list.getattr("foo");  // won't work directly because Py<T> implements Receiver<Target=T>

list.deref().getattr("foo"); // works but yuck
(*list).getattr("foo"); // hmm

I suspect what you want is to be able to call list.getattr("foo") directly. But, this would require the method dispatch logic to explore two paths, following both the Deref trait and Receiver trait. That could potentially lead to O(n^2) candidates, so I don't feel this is likely to be acceptable to the Rust community. I think ultimately you'd be asked to use traits.

WDYT?

@davidhewitt
Copy link
Author

I was speaking with @Urhengulas at EuroRust (I think I've got the right GH handle?), based on memory there are some ideas to explore:

  • What is the definition of the Receiver trait path? Can multiple Receiver definitions chain? (Is this useful in practice?) Maybe the set is restricted to just the deref chain plus all Receiver implementations for each type. (Which is just one implementation per type, unless you adjust the trait as per the bullet below.)

  • I think in your RFC draft you were going for

    trait Receiver { type Target; } 
    
    impl<T: Deref> Receiver for T { type Target = <Self as Deref>::Target; }

    What happens if instead you did

    trait Receiver<Target> { }
    
    impl<Target, T: Deref<Target = Target>> Receiver<Target> for T { }

    I think with the generic parameter you can still get the same blanket impl ensuring that Deref always makes a Receiver, however I think I would then be able to have impl Receiver<T> for Py<T> and also Deref<Target = Py<PyAny>> for Py<T>.

@adetaylor
Copy link

Maybe the set is restricted to just the deref chain plus all Receiver implementations for each type

I think that was one of the permutations we explored a long time ago but moved away from - I'll try to remember why.

I'm not sure we've explored the second of those permutations though. I'm still nervous about anything resulting in anything beyond a linear number of candidates to explore.

Would you address the questions in my comment though? To be a bit more concise:

  1. Am I right in saying you want to be able to call list.getattr("foo")
  2. Am I right that our proposal for a new trait Receiver { type Target; }, even without any blanket implementations, therefore still doesn't meet your needs?

@davidhewitt
Copy link
Author

  1. Yes, the current PyO3 API allows list.getattr("foo") and it seems correct to keep that. We could explore other options at cost of breaking our users over some deprecation/release cycles, but I would prefer not to.

  2. I think without the blanket implementation then we could have

    // so that `list.getattr("foo")` works
    impl Receiver for Py<T> {
        type Target = T;
    }
    
    // so that `list.get_item(index)` works
    impl Deref for Py<T> {
        type Target = Py<PyAny>;
    }

    ... and things might just work?

Running with the thought of going without a blanket impl and allowing separate Receiver and Deref implementations, I think the search space is not so bad? Suppose before adding any Receiver implementation we have a Deref chain which is N types long, I think that the full set of types to explore for method resolution becomes 2N in length if all N types have a Receiver implementation also?


One thing which I realise I haven't understood about your proposal is how will this work with pass-by-reference? E.g. with the impl Receiver for Py<T>, am I restricted to Py<T> by move or can I also accept by reference?

impl Receiver for Py<T> {
    type Target = T;
}

impl T { 
    // this seems relatively clear it should work
    fn by_move(self: Py<T>) {  }

    // these would also have use cases e.g. in PyO3's example above, it is by ref (we wouldn't need mut ref)
    fn by_ref(self: &Py<T>)  }
    fn by_mut_ref(self: &mut Py<T>)  }
}

@madsmtm
Copy link

madsmtm commented Oct 24, 2023

Unsure if I should discuss this here or in the pull request to change the RFC, but I'll do it here since that is where the context is.


It feels to me like arbitrary self types is the wrong way to go about solving the problem that PyO3 is facing here. Fundamentally, PyO3 wants to seamlessly go from Py<PyList> to Py<PyAny>, or more generally from any smart pointer P<T> to P<U>, where T is a subclass of U.

While this problem very prominently arises in receivers, it is by far not unique to that situation, and I fear that by designing arbitrary self types around it we will fundamentally limit future extensions that may provide exactly this kind of support. As an example, I can easily imagine a method fn push(self: &Py<PyList>, value: Py<PyAny>), which should ideally also be callable with a plain Py<PyList> as the value argument, without the user having to do anything to convert the type.

This kind of automatic conversion is a very object-oriented pattern where Rust has usually favoured traits, but even Rust actually already has a mechanism for doing it; it's called unsized coercion!

So maybe the ideal solution would actually be to also push that kind of work forwards, so that we can in the end have something like the following?

impl Unsize<PyAny> for PyList {}

impl<T: Unsize<U>, U> CoerceUnsized<T> for Py<U> {}

// Provides `Receiver<Target = T>`, as the RFC currently proposes
impl<T> Deref for Py<T> {
    type Target = T;
    fn deref(&self) -> &T { unimplemented!() }
}

impl PyList {
    // Calls python's `append` method, renamed for Rustiness
    fn push(self: &Py<PyList>, value: Py<PyAny>) { unimplemented!() }
}

// Things that should just work for the user
fn main() {
    let list: Py<PyList> = unimplemented!();
    list.getattr("foo"); // `list` is coerced to `Py<PyAny>`, and then the `Receiver` impl on `Py<PyAny>` allows calling `getattr`
    let another_list: Py<PyList> = unimplemented!();
    list.push(another_list); // `another_list` is coerced to `Py<PyAny>`
}

@davidhewitt
Copy link
Author

davidhewitt commented Oct 24, 2023

@madsmtm interesting point! Note that PyO3 doesn't really want to have Deref<Target = T> for Py<T>, because:

  • if T is PyList or other Python types then these are just opaque, and there's no real use of &PyList
  • if T is some user-defined type Foo, then Py<T> cannot implement Deref because it is a refcell-like structure (see final comment in the gist)

So I think regardless of switching to use CoerceUnsized the requirement to have Receiver and Deref exactly aligned is still a problem for PyO3.

Using CoerceUnsized instead of Deref for Py<T> -> Py<PyAny> is definitely an interesting proposal though. My gut reaction is that using "unsized" coercions for this is slightly out, as PyAny isn't a dynamically-sized type, however the coercion mechanism of subtyping is definitely what PyO3 is aiming for.

@madsmtm
Copy link

madsmtm commented Oct 24, 2023

Note that PyO3 doesn't really want to have Deref<Target = T> for Py<T>, because [snip]. So I think regardless of switching to use CoerceUnsized the requirement to have Receiver and Deref exactly aligned is still a problem for PyO3.

I think my point here would be that the PyO3 would then implement Receiver<Target = T> for Py<T> instead, and not have a Deref implementation at all (which is possible under the original RFC where Receiver has a blanket impl for T: Deref).

My gut reaction is that using "unsized" coercions for this is slightly out

I agree that the naming of that Rust feature does not really reflect what we want, in reality we want a more generic Coercion trait of some kind.


Ideally, if we could do breaking changes to Rust, I think the prettiest design would've been trait Receiver { type Target; } and trait Deref: Receiver { fn deref(&self) -> &Self::Target }, but that ship has sailed, so while I agree that Rust should favour explicit implementations, I think there is a lot of value in keeping the blanket impl.

Just imagine how many libraries out there is using Deref/DerefMut, which would now have bump their MSRV and be updated to also have a Receiver implementation that exactly matches their Deref implementation, just to be as nicely usable as Box<T>.

@davidhewitt
Copy link
Author

I see your point that the blanket makes a lot of libraries just work. Strictly speaking they wouldn't have to bump MSRV, they can add a build script to do feature detection and conditionally implement Receiver. But that's still a bit of work across multiple points in the ecosystem.

Also true that I can still implement Receiver if I don't implement Deref; I keep overlooking this because I keep wanting to have the Deref impl 😅. Having reflected on this I think I can make PyO3 work without either CoerceUnsized or Deref. I can have a trait PyAnyMethods and a blanket impl<T> PyAnyMethods for Py<T>.

So, maybe your existing RFC draft is already fine for what PyO3 needs, and you shouldn't zap the blanket impl based on what I've said here? Certainly it's been great to discuss all these cases and I'm hopeful to see the RFC accepted!

@adetaylor
Copy link

Thanks @madsmtm and @davidhewitt for all the discussion.

I think I agree with your last comment David - I think you can make everything work with just a Receiver impl (without Deref) and then some traits.

That said,

  • I am pretty sure that the blanket impl of Receiver for Deref will be seen as Not The Rust Way as soon as I raise the PR. It is certainly unusual. I am worried we will sink a ton of time discussing this without really clear arguments on either side. If you can think of a way to avoid this, I'm all ears!
  • It's not your fault that you overlook the possibility of Receiver without Deref. I think the RFC is insufficiently clear about this, and I'll work on it.
  • Your PyO3 example made me realize an assumption underlying our blanket impl: we're assuming that folks are using Deref because their type is a smart pointer containing something (has-a relationships). You want(ed) to use Deref for a completely different purpose, to express is-a relationships, along the lines of coercion. In this case, people might validly want their Deref resolution and their Receiver resolution to point in different directions. I think we need to be more explicit in the RFC that we're choosing not to be compatible with such use-cases, and they should be achieved using traits (or some future Coercion trait) instead.

So this has been a most useful discussion, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment