davidhewitt/pyo3-arbitrary-self-types.rs

## pyo3-arbitrary-self-types.rs
//! The following is a simplified form of a possible PyO3 API which shows
//! cases where arbitrary self types would help resolve papercuts.

// ----------------------------------------------------------------------------------
//
// Case 1 - PyO3's object hierarchy. We have a smart pointer type Py<T> and want to
// use it as a receiver for Python method calls.
//
//

/// Python's C API is wrapped by `pyo3-ffi` crate, also exported as `pyo3::ffi`
/// submodule.
mod ffi {
    extern {
        /// A Python object. For this model we don't care about it's contents, so we
        /// just use unstable "extern type" syntax to name it.
        type PyObject;
    }
}


/// A smart pointer to a Python object, which is reference counted. A good enough
/// description is that it is approximately an `Arc<T>` where the memory is
/// stored on the Python heap and reference counting is synchronized by the
/// Python GIL (Global Interpreter Lock).
///
/// Here in this model we ignore the existence of the Python GIL as it is just a
/// distraction. In PyO3's real API we have a lifetime `'py` on several types to
/// model this
struct Py<T>(NonNull<ffi::PyObject>);

// -- Some zero-sized types to describe Python's object hierarchy. --

/// Any Python object.
struct PyAny(());

/// A concrete subtype, a Python list.
struct PyList(());

// -- Implementations of methods on these types --

// In practice these methods return results, we'll ignore that here.

impl PyAny {
    /// Get an attribute on this object. In Python syntax this is `self.name`.
    ///
    /// Receiver is &Py<PyAny> - arbitrary self type!
    fn getattr(self: &Py<PyAny>, name: &str) -> Py<PyAny> { /* ... */ }
}

impl PyList {
    /// Get an element from this list. In Python syntax this is `self[idx]`.
    ///
    /// Receiver is &Py<PyList> - arbitrary self type!
    fn get_item(self: &Py<PyList>, idx: usize) -> Py<PyAny> { /* ... */ }
}

// In addition, we want to call `getattr` with a `Py<PyList>`, because this is
// a valid operation too. The cleanest way to do this is with `Deref`:

impl Deref for Py<PyList> {
    type Target = Py<PyAny>;

    fn deref(&self) -> &Py<PyAny> { /* ... */ }
}

// ... but if arbitrary self types is tied to Deref, instead we have to have

impl Deref for Py<PyList> {
    type Target = PyList;

    fn deref(&self) -> &PyList { /* ... */ }
}

// We could find other ways to make Py<PyList> have a getattr method without
// `Deref`, e.g. by moving all of `PyAny` methods onto a trait and implementing
// it for `Py<PyAny>`, `Py<PyList>` and so on. This leads to a lot of repetition;
// N trait implementations for N concrete types PyAny, PyList, etc.

// Also the `&PyList` reference on its own is useless, so `Deref<Target = PyList>`
// is a little weird.

// ----------------------------------------------------------------------------------
//
// Case 2 - PyO3's "refcell" container synchronized by the GIL. This has a close
// cousin in `std::cell::RefCell`.
//
//

/// PyO3 has a `#[pyclass]` macro which generates a Python type for a Rust
/// struct.
/// - `Foo` continues to be the plain old Rust struct.
/// - `Py<Foo>` is a smart pointer to a Python object which contains a `Foo`.
#[pyclass]
struct Foo { /* ... */ }

/// To implement methods on the Python type PyO3 has a `#[pymethods]` macro.
///
/// Users can use `&self` and `&mut self` receivers. To make this possible,
/// `Py<Foo>` like `RefCell<Foo>` but uses the Python GIL for synchronization.
/// `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` are the guards to `Py<Foo>`.
impl Foo {
    /// Receive by `&self``, read only the Rust data. Possible today.
    fn a(&self) { /* ... */ }

    /// Receive by `&mut self`, read or write only the Rust data. Possible today.
    fn b(&mut self) { /* ... */ }

    /// Receive by `Py<Foo>`. `Py<Foo>` implements `Deref<Target = Py<PyAny>>`
    /// so that all Python operations are accessible.
    ///
    /// This is an arbitrary self type.
    ///
    /// Current users of PyO3 have to use `slf: Py<Foo>` which is awkward
    /// and also loses method call syntax.
    fn c(self: Py<Foo>) { /* ... */ }

    /// Receive by `PyRef<'_, Foo>`. `PyRef<'_, Foo>` is a pointer to the Python
    /// data. It implements `Deref<Target = Foo>` to give read access to the Rust
    /// data.
    ///
    /// This is an arbitrary self type.
    ///
    /// Same workarounds for current users of PyO3 apply.
    fn d(self: PyRef<'_, Foo>) { /* ... */ }

    /// Receive by `PyRefMut<'_, Foo>`. `PyRefMut<'_, Foo>` is a pointer to the Python
    /// data. It implements `DerefMut<Target = Foo>` to give read and write access to
    /// the Rust data.
    ///
    /// This is an arbitrary self type.
    ///
    /// Same workarounds for current users of PyO3 apply.
    fn e(self: PyRefMut<'_, Foo>) { /* ... */ }
}

// Note that in the above, `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` both implement
// `Deref<Target = Foo>` so would fit fine with deref-based arbitrary self types.
//
// However `Py<Foo>` cannot implement `Deref<Target = Foo>`, just like how `RefCell<T>`
// cannot implement `Deref<Target = T>`.
//
// To make `Py<Foo>` be able to implement `Deref`, we must give up its refcell-like
// feature. This removes `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>`, and it also
// removes the ability to have `&mut self` as a receiver. The mutable access
// needs the runtime refcell protection due to Python code being incompatible with
// the borrow checker.
//
// There is a possible argument that removing `&mut self` and refcell feature is
// a good thing, but it is also _extremely_ ergonomic for users. We could have
// a long conversation about whether PyO3 made the wrong API choice here. There is
// `#[pyclass(frozen)]` which opts-in to this restriction, so by flipping the default
// and then removing the option we could evolve PyO3's API over time if we think
// deref-based arbitrary self types is the correct formulation of arbitrary self types.
//
// If you feel like a long distraction, we can discuss how Python might
// be removing the GIL, and how that means that PyO3 might be forced to change
// anyway.
	//! The following is a simplified form of a possible PyO3 API which shows
	//! cases where arbitrary self types would help resolve papercuts.

	// ----------------------------------------------------------------------------------
	//
	// Case 1 - PyO3's object hierarchy. We have a smart pointer type Py<T> and want to
	// use it as a receiver for Python method calls.
	//
	//

	/// Python's C API is wrapped by `pyo3-ffi` crate, also exported as `pyo3::ffi`
	/// submodule.
	mod ffi {
	extern {
	/// A Python object. For this model we don't care about it's contents, so we
	/// just use unstable "extern type" syntax to name it.
	type PyObject;
	}
	}


	/// A smart pointer to a Python object, which is reference counted. A good enough
	/// description is that it is approximately an `Arc<T>` where the memory is
	/// stored on the Python heap and reference counting is synchronized by the
	/// Python GIL (Global Interpreter Lock).
	///
	/// Here in this model we ignore the existence of the Python GIL as it is just a
	/// distraction. In PyO3's real API we have a lifetime `'py` on several types to
	/// model this
	struct Py<T>(NonNull<ffi::PyObject>);

	// -- Some zero-sized types to describe Python's object hierarchy. --

	/// Any Python object.
	struct PyAny(());

	/// A concrete subtype, a Python list.
	struct PyList(());

	// -- Implementations of methods on these types --

	// In practice these methods return results, we'll ignore that here.

	impl PyAny {
	/// Get an attribute on this object. In Python syntax this is `self.name`.
	///
	/// Receiver is &Py<PyAny> - arbitrary self type!
	fn getattr(self: &Py<PyAny>, name: &str) -> Py<PyAny> { /* ... */ }
	}

	impl PyList {
	/// Get an element from this list. In Python syntax this is `self[idx]`.
	///
	/// Receiver is &Py<PyList> - arbitrary self type!
	fn get_item(self: &Py<PyList>, idx: usize) -> Py<PyAny> { /* ... */ }
	}

	// In addition, we want to call `getattr` with a `Py<PyList>`, because this is
	// a valid operation too. The cleanest way to do this is with `Deref`:

	impl Deref for Py<PyList> {
	type Target = Py<PyAny>;

	fn deref(&self) -> &Py<PyAny> { /* ... */ }
	}

	// ... but if arbitrary self types is tied to Deref, instead we have to have

	impl Deref for Py<PyList> {
	type Target = PyList;

	fn deref(&self) -> &PyList { /* ... */ }
	}

	// We could find other ways to make Py<PyList> have a getattr method without
	// `Deref`, e.g. by moving all of `PyAny` methods onto a trait and implementing
	// it for `Py<PyAny>`, `Py<PyList>` and so on. This leads to a lot of repetition;
	// N trait implementations for N concrete types PyAny, PyList, etc.

	// Also the `&PyList` reference on its own is useless, so `Deref<Target = PyList>`
	// is a little weird.

	// ----------------------------------------------------------------------------------
	//
	// Case 2 - PyO3's "refcell" container synchronized by the GIL. This has a close
	// cousin in `std::cell::RefCell`.
	//
	//

	/// PyO3 has a `#[pyclass]` macro which generates a Python type for a Rust
	/// struct.
	/// - `Foo` continues to be the plain old Rust struct.
	/// - `Py<Foo>` is a smart pointer to a Python object which contains a `Foo`.
	#[pyclass]
	struct Foo { /* ... */ }

	/// To implement methods on the Python type PyO3 has a `#[pymethods]` macro.
	///
	/// Users can use `&self` and `&mut self` receivers. To make this possible,
	/// `Py<Foo>` like `RefCell<Foo>` but uses the Python GIL for synchronization.
	/// `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` are the guards to `Py<Foo>`.
	impl Foo {
	/// Receive by `&self``, read only the Rust data. Possible today.
	fn a(&self) { /* ... */ }

	/// Receive by `&mut self`, read or write only the Rust data. Possible today.
	fn b(&mut self) { /* ... */ }

	/// Receive by `Py<Foo>`. `Py<Foo>` implements `Deref<Target = Py<PyAny>>`
	/// so that all Python operations are accessible.
	///
	/// This is an arbitrary self type.
	///
	/// Current users of PyO3 have to use `slf: Py<Foo>` which is awkward
	/// and also loses method call syntax.
	fn c(self: Py<Foo>) { /* ... */ }

	/// Receive by `PyRef<'_, Foo>`. `PyRef<'_, Foo>` is a pointer to the Python
	/// data. It implements `Deref<Target = Foo>` to give read access to the Rust
	/// data.
	///
	/// This is an arbitrary self type.
	///
	/// Same workarounds for current users of PyO3 apply.
	fn d(self: PyRef<'_, Foo>) { /* ... */ }

	/// Receive by `PyRefMut<'_, Foo>`. `PyRefMut<'_, Foo>` is a pointer to the Python
	/// data. It implements `DerefMut<Target = Foo>` to give read and write access to
	/// the Rust data.
	///
	/// This is an arbitrary self type.
	///
	/// Same workarounds for current users of PyO3 apply.
	fn e(self: PyRefMut<'_, Foo>) { /* ... */ }
	}

	// Note that in the above, `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>` both implement
	// `Deref<Target = Foo>` so would fit fine with deref-based arbitrary self types.
	//
	// However `Py<Foo>` cannot implement `Deref<Target = Foo>`, just like how `RefCell<T>`
	// cannot implement `Deref<Target = T>`.
	//
	// To make `Py<Foo>` be able to implement `Deref`, we must give up its refcell-like
	// feature. This removes `PyRef<'_, Foo>` and `PyRefMut<'_, Foo>`, and it also
	// removes the ability to have `&mut self` as a receiver. The mutable access
	// needs the runtime refcell protection due to Python code being incompatible with
	// the borrow checker.
	//
	// There is a possible argument that removing `&mut self` and refcell feature is
	// a good thing, but it is also _extremely_ ergonomic for users. We could have
	// a long conversation about whether PyO3 made the wrong API choice here. There is
	// `#[pyclass(frozen)]` which opts-in to this restriction, so by flipping the default
	// and then removing the option we could evolve PyO3's API over time if we think
	// deref-based arbitrary self types is the correct formulation of arbitrary self types.
	//
	// If you feel like a long distraction, we can discuss how Python might
	// be removing the GIL, and how that means that PyO3 might be forced to change
	// anyway.