Skip to content

Instantly share code, notes, and snippets.

@bgschiller
Created August 23, 2018 00:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bgschiller/ba580e8980a7833abef1a0ffeaef8798 to your computer and use it in GitHub Desktop.
Save bgschiller/ba580e8980a7833abef1a0ffeaef8798 to your computer and use it in GitHub Desktop.

What's an interface?

When we talk about interfaces or protocols, there are a couple different ideas that often get lumped together. It might help to explain why some languages use "protocol" and some use "interface"—they have slightly different connotations.

Both concepts, at their best, are a way of saying "all I need is a type that supports..." along with a list of what it needs. This is what people mean when they say "Depend on abstractions instead of on concrete classes".

The main difference between Interfaces and Protocols is that Interfaces have to be explicitly implemented, or they don't compile. Protocol usually implies that, as long as the type signatures match, we're happy.

Interfaces

Interfaces are useful for statically (meaning: "at compile-time rather than at run-time") verifying that function calls are compatible between where they're defined and all the places they're used. This static verification is why people say that static languages (like C#, Java, Kotlin, etc) are safer than dynamic languages (like python, ruby, javascript). With static typing, you will never run into a "TypeError: obj.method is not a function".

However, that static verification can sometimes make it difficult to write certain types of code. Interfaces are a way of opening-up the signatures of functions to say "Okay, I guess I don't need a true List<int>in order to compute the sum. I'd be happy with anything collection-like object that contains ints: an IEnumerable<int>".

If we wrote

int sum(List<int> nums) {
  int total = 0;
  foreach(int num : nums) {
    total += num;
  }
  return total;
}

Then we would have to give explicitly a List<int>. A HashSet<int> wouldn't cut it, even though the function would actually work just fine with one. The type signature says it requires a List<int> and passing anything else will be a compile error.

Using an interface is a way of relaxing those constraints. "I need something that can be iterated over" is less stringent. Why require a type that:

  • is laid out contiguously in memory
  • has a known length
  • supports random-access ("Give me the item at position 412")
  • can be iterated over forwards
  • can be iterated over backwards
  • can be added to
  • can be removed from
  • ...

when all we really need is something that can be iterated over.

Protocols

There's also the idea of a protocol. Usually this is just referred to as "duck-typing": if it looks like a duck, quacks like a duck, then we can treat it like a duck.

So, you don't need to explictly say "I AM CREATING THIS CLASS AND IT WILL IMPLEMENT THE PIZZA INTERFACE". Instead, if it has compatible methods, then you can use it just like a pizza, even if it's really a flatbread. Go and Haskell both are more like this (though Haskell calls them "type classes", and Go calls them "interfaces" 🤦)

Protocols are (in my opinion) better in every way than interfaces, except for one. Interfaces, because they're explicit, are easier to statically check for correctness. It's more difficult to check "Yeah, this is a class that can be iterated over, because I can see it supports these functions..." than it is to check "Yes, this implements IEnumerable".

Well, in python...

Python actually has both Interfaces (aka Nominal Subtyping, aka abstract-base-classes), and Protocols (aka Structural Subtyping, aka duck typing, aka type classes). So it's a good language to use to draw the distinction between the two ideas that are conflated when we talk about interfaces.

Interfaces

In python, the abc module, like @Jonah Jolley said, can be used to make interfaces. That is, it enforces that any subclasses must implement methods with the same signature. That's more how C#, Java think of interfaces.

Interfaces are rarely used in python because there is no way to do static analysis. Since protocols/duck-typing is more convenient, most people just use those. (Note: python now supports static verification via mypy and a couple other projects).

Protocols in python

There are some protocols that are so useful they have kinda "official" support in Python. The example @Kyle Misencik referred to is the Context Manager protocol. That says "Sometimes there's code that needs to happen surrounding some context. Some examples:

  • If I want to read or write to a file, I need to open it first, and I need to close it after. Regardless of what I'm doing with the file, that needs to happen, and it needs to happen even if something goes wrong and an exception is raised.
  • Same with a database cursor-- regardless of what I'm doing with the cursor (inserts, updates, selects, a whole bunch of things in a row), I'll need to open it first and close it afterwards.
  • Many other resources are this way, eg concurrency locks. You have to acquire them before you can start doing work, and you have to release them or bad things happen.
  • You might have some code that you want to time. You need to start a timer when you enter that portion of code, and stop it when you exit that portion of code.

The Context Manager protocol is a way of saying "It's common to need to do something before a block and after a block". We don't want to try and anticipate all the ways this might be used. Instead, we're going to provide first-class language support for the pattern

value = setup() # setup and value are implementation defined: different depending on which context manager
try:
  # Do some stuff here. This is _usage_ defined.
  # In js, we would use an anonymous function
  # In Obj-c I think you use a continuation block (is that right?)
finally:
  cleanup() # like setup, cleanup is different for each context manager.

Because it's common to need a structure like that, we can define an object that adheres to the "context manager" protocol. As long as it does, we can use syntax like:

with setup_on_enter_and_cleanup_on_exit() as value:
  # do some stuff here.
  # Even if an exception is raised, we'll still clean up.

So, with files we can

with open('/usr/share/dict/words') as f:
  words_with_no_vowels = []
  for line in f:
    if not any(vowel in line for vowel in 'aeiou'):
      words_with_no_vowels.append(line)
# File is automatically closed at this point, even if the code in this block raises an exception for some reason.

Or with locks we can say

with pg_advisory_lock(org_id): # the `setup()` part of this will pause until the lock is available, then claim it.
  # do some things that shouldn't happen concurrently with another process.

# lock is automatically released (even if exception)

Any class can be used like this, even user-defined ones, as long as they implement:

  • a __enter__ method, that takes no arguments (other than self, which is python's name for this)
  • an __exit__ method, that takes three arguments: exc_type, exc_value, traceback, which are only filled out if an exception was raised.

So let's make one of these, to demonstrate what the protocol looks like:

class timed:
  def __enter__(self):
    """
    Here is where we put the setup code.
    """
    self.start = datetime.datetime.now()

  def __exit__(self, exc_type, exc_value, traceback):
    end = datetime.datetime.now()
    print(f"that code ran for {(end - start).total_seconds()} seconds")

    if exc_type:
      print(f"also, it raised an exception: {exc_type}({exc_value})")

## usable like

with timed:
  do_some_computation()

We've spent a lot of time on the context manager protocol, because I think it's a really fun one. But it's not even the most common protocol. That honor goes to the iterator protocol. An iterator is anything that can be iterated over at least once (maybe more times, like a list, but maybe only once if the iterator represents something like db cursor, or a file that we're streaming through)

Any time you use a for-loop in python you're using the iterator protocol:

for x in numbers_between(3, 30):
  print(x)

It's possible that numbers_between() just returns a list:

def numbers_between(start, stop):
  nums = []
  curr = start
  while curr < stop:
    nums.append(curr)
    curr += 1
  return nums

But it's also possible that it generates the numbers on demand:

class numbers_between:
  def __init__(start, stop):
    self.curr = start
    self.stop = stop
  
  def __iter__(self):
    return self
  
  def __next__(self):
    if self.curr == self.stop:
      raise StopIteration()
    self.curr += 1
    return self.curr

The iterator protocol consists of kinda one, kinda two functions, depending on how you count:

  • __iter__ should produce an object with a __next__ method. It's common to just return the same object.
  • __next__ should be called in sequence and return whatever the next thing in the for-loop should be. It should raise StopIteration() if there are no more items.

If you're willing to create these two functions, then anything that accepts an iterator can also accept your function.

set(numbers_between(5,10))
filter(is_even, numbers_between(5, 10))
for num in numbers_between(5, 10):
  ...

A note on __conventions__

For the built-in protocols, python nearly1 always uses the double-underscore for the names of functions required by the protocol. You can learn more about these at http://www.diveintopython3.net/special-method-names.html

1: Python has the concept of a file-like object using read() and write(), which don't have underscores.

Further reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment