FHell/Thoughts-on-Protocols.md

## Thoughts-on-Protocols.md

      
    Raw
  

              Thoughts-on-Protocols.md
            
          
    Disclaimer: I am a physicist, not a computer scientist. I have no experience with creating programming languages. I am leading a research group that is all in on Julia. I have been following developments in many computer languages for many years, and evaluated many options reasonably deeply before deciding to move us to Julia. This are my thoughts on some of the most important pain points right now from the perspective of a user of the language, advanced libraries and occasional contributor and writer of libraries.
The problem statement

Currently Julia lacks a good way to express how things should behave. In many languages types are used to express this information. In Julia, in order to make maximum use of the languages dynamicism, we are encouraging to write code as generically as possible, and defer specifying types to the users of libraries as much as possible. This allows the injection of unanticipated behaviours deep inside our code.
 # Do
function norm(x, y)
  (x - y) ^ 2
end

# Don't
function norm_f64(x::Float64, y::Float64)::Float64
  (x - y) ^ 2
end

While function signatures are generic, our users types are hopefully concretely specified, and this type information is passed along inside the generic code. Ideally the internal data structures we need are parametrized in terms of the user provided types. Together with dispatching on function types, this allows the users of libraries to inject specific behaviours deep into generic functions. This leads to the ability to create powerful and unanticipated composite libraries.
The flip side is that there is no place that the assumptions I make in the construction of the library are specified or documented. Where in other languages type signatures would serve this prupose, in Julia I don't really have them (e.g.  map vs map). I can not easily encode invariants obeyed by my code, or assumptions I make on what is passed to me.
Also crucial: Error messages can be arbitrarily complex and convoluted, as code will fail not when passed to the library that contains the assumption, but deep inside my own algorithms when the assumption is violated, often with completely unreadable type signatures. (This is something familiar from C++ Template metaprogramming which, I believe, shares some similarities with how we write functions in Julia).
For documentation, there is also no natural place to put docstrings. If I expect the argument of my library function to have an empty function that does something specific, its quite possible that I never implement an empty function in the library itself. If I require a function with a specific signature to be passed, I can't encode that anywhere. There is no natural place to document these assumptions (and violating them probably leads to horrible error messages). A different documentation issue is that generally there is no place to say what a function (as opposed to its methods) should do. Docstrings are always attached to the concrete methods. And while it is maybe clear to the reader that the method that takes ::Any is in some sense the prototype, this isn't really codified anywhere. Conversely for types, there is no way to quickly determine (e.g. in the REPL) what the most pertinent things are I can do with this type.
A pattern that is empahsized by some people in Julia is to work with accessor functions (fooable(t::T) rather than t.fooable). This has obvious advantages for composability but greatly exacerbates all the above issues. I believe this is why we still end up with complex composite types all over the place rather than having this pattern as default.
Finally all of this also makes it difficult to reason about the behaviour of interacting code. In a language that emphasizes composition, this is a serious issue to me.
Protocols for Julia

All this leads me to posit that Julia needs a way to specify behaviours programmatically. This should be complementary to the genericism and dynamicism of Julia code.
There are a varitey of language constructs in other languages that try to solve these and similar issues:

Concepts (C++ 20)
Traits (Rust)
(Base) Class (Python)

In my mind the cleanest example to learn from, and the most pertinent to Julia, are Rust-style traits. Rust is a language that is composition based and not based on OOP.
(Consider how prominently Traits feature in Rusts generally excellent Error messages: https://blog.rust-lang.org/2020/05/15/five-years-of-rust.html#error-diagnostics )
(Note that C++ concepts work differently to Rust Traits in that they don't require the fact that T implements a particular protocol to be explicitly declared, it is infered. I think it is a good idea to require explicitness here.)
Relationship to Julia Traits

Julia has an ongoing lively discussion about Traits going on. From what I have read, these are mostly intended to serve to generalize the dispatch based control flow, or work around the restriction of having only one AbstractType for every concrete type. In contrast, the protocol idea I am considering here is supposed to help encode invariants.
While a critical question that I have no good answer for is how this should interact in detail with dispatch, I would emphasize that this is not the core question here (and I personally think Holy style traits offer a satsfactory solution to this). In the sketch below I build directly on Holy style Traits. To avoid the confusion with that discussion I have called this sketch Protocols.
A sketch

A simple macro based implementation that buiilds on top oh Holy traits for dispatch (each protocol comes with a Holy style trait) could look like this:
@Prot begin

Protocol Iterable

  # Specify fields the Iterable must have.
  start::Int64

  # Docstrings in Protocol definitions could serve as documentation for functions
  # rather than methods.
  @prot_doc """
  Provide the next element of an Iterable.
  """
  function next(t :~ Iterable) # no default implementation

  # protocol functions could have default implementations
  function 2ndnext(t :~ Iterable) # default implementation
    next(t)
    next(t)
  end

end

# We now define a type...
struct T
  start::Int
  c
end

# and implement the Protocol for it
impl Iterable for T

  # Here we also check the function signature is correct  according to the Protocol
  function next(t::T)
    t.c
  end

end


# We can then define functions that take Iterables and only Iterables of
# arbitrary type as input.
function get_third(b :~ Iterable)
  2ndnext(b)
  next(b)
end

# we get an object of type T
t = T(1, 5)

# and call get_third on it:
get_third(t)

############################
###### Error Messages ######
############################

# Calling a function expecting a Protocol on something that doesn't implement it:
get_third(1)
# raises "Type of argument arr does not implement protocol Iterable"

# Incomplete implementations can also be easily caught and meaningful errors can be given:
struct U
  start::Int
end

impl Iterable for U
end #This errors with "Incomplete implementation of Iterable for U: next(t::U) not given."

# Or take another type V
struct V
  s::Int
end

impl Iterable for V
  next(v::V) = 1
end #This errors with "Can not implement Iterable for V. V has no field V.start"


end #@Prot
(A sketch for what this macro could expand to can be found at https://github.com/FHell/protocols.jl/)
Obviously the design space is huge here, and many options have subtle issues when you start considering interacting with other language features. This is also the reason to build on the proven Holy style traits. I think it would be desirable to experiment with these designs a bit. I think most of the above can be implemented as a Macro in a fairly straightforward way, but I don't have enough experience to just hack something up in an afternoon.
For documentation simply having a page generated with the docstrings of expected and default implementations would often be an excellent starting point for documentation of packages. As it is, this information often comes late in the process of documenting packages (and is out of date whenever internal structures shift, as it is not backed by anything automated). In the REPL typing ? t :~ and then tab should cycle through protocols that t implements, and ? t :~ Iterable or ? Iterable could list the functions that are available as part of the protocol only.
A use case: Serializing

A central issue for some of us in the Julia scientific ecosystem is to save data that doesn't come in the form of arrays to disc without losing the analysis functions provided by libraries. Using the JLD2/BSON.jl serializer often is extremely fragile (updating anything might mean that your data doesn't load anymore). Efforts to build generic de/serializer code for julia is ongoing but also highly challenging.
It is interesting to note how this works in Rust in Serde. Serde provides a Serialize and Deserialize trait, and a derive macro that automatically implement these for many types (requires the user to call it). However, if the types become to complex, then it is up to the library to provide a de/serializer. This decision is up to the writer of the type.
If you define a simple container type that you intend to serialize, a macro could be provided for creating a default implementation of serializable:
@serializable
struct Bla
    foo1::Int
    foo2::Float
    foo3::Array{Float64, 1}
end
A use case: AbstractArray

AbstractArray is very much used like a protocol already. However, this already means I can not combine different behaviours. Whereas other parts of Julia emphasize composition, here I am stuck. It also means that if I want something that is an AbstractArray, I can not easily use my own abstract type hierarchies to encode assumptions/invariants/information about my code. I can't have an ODEFunction
Further, there is no programmatic way to check whether my custom implementation is implementing everything I need for an AbstractArray.
More?

I am sure there are  many aspects and questions I am overlooking right now. I would like to try to build this protocol Macro and trial it on some of our own packages. Before doing so I was hoping for some input from the community.
An open question we need to answer when fleshing out the sketch is to look at edge cases of multiple Protocols interacting. Further it would be cool if parametrized Protocols, that take type parameters, could be considered.
Something that is beyond the scope of this proposal but would of course be highly desirable is if the protocol/trait system would become a real part of the Julia type system and standard library. This might mean Types like Array{T :~ Dualizable}, or that functions were automatically Callable{T_Args, T_Return}. Possibly providing us with signatures like map(f :~ Callable{T, U}, arr :~Iterable{T}) :~Iterable{U} where {T, U}.
Another question is if it's possible to automatically generate Protocol annotations for Packages from an analysis tool. I am imagining adding tracing to a type to find every method that is called on it when it is used with a package...