Skip to content

Instantly share code, notes, and snippets.

@i0bs
Last active April 20, 2023 04:59
Show Gist options
  • Save i0bs/6804b334ce2f2ef292d3f6ec0fc1b9f0 to your computer and use it in GitHub Desktop.
Save i0bs/6804b334ce2f2ef292d3f6ec0fc1b9f0 to your computer and use it in GitHub Desktop.
Building a Discord Gateway client.

Building a Gateway client

The goal of this gist is to explain and teach developers how to write a working Gateway client. Please see the Table of Contents for sections.

Table of Contents

What the #$)* is a Gateway?

Discord has two kinds of APIs:

  • a RESTful API, which is entirely web-based - compromised of HTTP endpoints and request conditions.
  • a Gateway API, heavily WebSocket-based - used for real-time communication of events happening from within the client.

When a user or bot logs into Discord, they have to connect to what's referred to as the "Gateway." This server is Discord's form of real-time communication, responsible for making sure events ongoing from within are dispatched at the right place and at the right time.

Design

I've already created a couple of tutorials explaining the design of a Gateway client:

Data Layer

Payloads

Payloads are responsible for standardising how data is structured and sent in the Gateway. These can be used in one of two scenarios:

  • You're trying to properly connect to Discord's Gateway, known as send events.
  • You're wanting to handle data received after a successful connection, known as receive events.

The following represents how payloads are structured:

Field Type Description
op integer Denotes the operation code of the payload, i.e. action vs. event.
d ?mixed (any JSON value) Represents data received, which can be of any type with a JSON value.
s ?integer * Represents the sequence number of an event, used for helping repeat one in the event of a disconnect.
t ?string * Denotes the event name, most often used for dealing with dispatching.

* null value when the payload is for a receive event.

Operation Codes

Operation codes, also known as "opcodes," are identifiers for specifying what kind of data is going to be returned. Much like payloads, opcodes can be split up between send and receive-based events.

The following represents valid Gateway opcodes. Only the most important ones are shown:

Code Name Action Description
0 Dispatch * Receive An event has occured with the selected intents.
1 Heartbeat Send/Receive Fired periodically by a client to keep the Gateway connection alive.
2 Identify Send Starts a new session within the connection by making a handshake for identification.
6 Resume Send Resumes a previous session in a new connection that was ended by the Gateway.
7 Reconnect Receive The connection's session is recommended to be reconnected and resumed to.
9 Invalid Session Receive The connection's session has been invalidated by the Gateway, a new connection is needed.
10 Hello Receive Sent upon starting a new connection to the Gateway.
11 Heartbeat ACK Receive A heartbeat that was sent by the client has been validated by the Gateway.

* Represents a large majority of receive events.

Transport Layer

Discord's transport layer is done by sending packets through a WebSocket server. The Gateway utilise packets alongside status codes in order to regulate data passage.

Packets

Packets are sent through Gateway events as a serialised JSON string.

Status Codes

Because the WebSocket protocol is very much bound to the same disconnect handling as HTTP/1.1, Discord use custom status codes in their Gateway to help represent what a closing state (closure) was produced by.

Some status codes allow for a reconnect, meaning the same session can be used to resume.

The following represents all valid closures:

Code Name Explanation Reconnect?
4000 Unknown error We're not sure what went wrong. Try reconnecting? true
4001 Unknown opcode You sent an invalid Gateway opcode or an invalid payload for an opcode. Don't do that! true
4002 Decode error You sent an invalid payload to Discord. Don't do that! true
4003 Not authenticated You sent us a payload prior to identifying. true
4004 Authentication failed The account token sent with your identify payload is incorrect. false
4005 Already authenticated You sent more than one identify payload. Don't do that! true
4007 Invalid seq The sequence sent when resuming the session was invalid. Reconnect and start a new session. true
4008 Rate limited Woah nelly! You're sending payloads to us too quickly. Slow it down! You will be disconnected on receiving this. true
4009 Session timed out Your session timed out. Reconnect and start a new one. true
4010 Invalid shard You sent us an invalid shard when identifying. false
4011 Sharding required The session would have handled too many guilds - you are required to shard your connection in order to connect. false
4012 Invalid API version You sent an invalid version for the gateway. false
4013 Invalid intent(s) You sent an invalid intent for a Gateway Intent. You may have incorrectly calculated the bitwise value. false
4014 Disallowed intent(s) You sent a disallowed intent for a Gateway Intent. You may have tried to specify an intent that you have not enabled or are not approved for. false

Events

Gateway events are structured forms of a Gateway payload, based off of an operation code. In an instance where 0 becomes the opcode, the data field will represent information about an event that happened within a server or the application's DMs.

Events can be determined by the payload's event name field. When matched, names of events will be written in full upper snake case, i.e. MESSAGE_CREATE and etc.

Abstract

The Big Picture

The Gateway itself is an entagled, complicated subject. For a quick rundown of the structure, here's a diagram to visualise how data is interacted with itself, the client and server:

---
title: Gateway Structure
---
classDiagram
  direction RL
  note "This diagram is speculative to the Gateway\nlifecycle found in the Developer Documentation."
  
  class Packet {
    JSON serialised string
  }
  class Payload {
    +str op
    +typing.Any d
    +int s
    +str t
  }
  note for Payload "op - Operation code\nd - Payload data\ns - Sequence number\nt - Payload event name/type"
  
  class OpCode {
    Operation codes denote an action or payload.
  }
  
  Payload *-- OpCode
  
  Packet --> Payload
  Packet "1" --> "*" Payload
  Payload "*" --> "many" Packet : Inherits
  
  class State {
    +int version 
    +str encoding
    +str? compress
    +float? heartbeat_interval
    +str? resume_gateway_url
    +str? session_id
    +int? seq
  }
  note for State "version - Version of the Gateway/WebSocket API\nencoding - How we want data to be transported as.\ncompress - Compress smaller data?\nheartbeat_interval - Periodic time to send heartbeats\nresume_gateway_url - URL for resuming connections\nsession_id - Unique ID of the connection session\nseq - Related to payload sequence"
  
  class Protocol {
    <<interface>>
  }
  
  class Client {
    +str token
    +State state
    +connect()
    +reconnect()
    +_error()
    +_track()
    +_dispatch()
    +_identify()
    +_resume()
    +_heartbeat()
  }
  note for Client "token - Bot token\nstate - Connection state data\nconnect() - Starts a new connection\n_error() - Handles error returns\n_track() - Tracks and handles operation codes"
  note for Client "_error()\nErrors are thrown as exceptions, but will be caught\nand met with a reconnection attempt. If this fails,\na new connection is formed."
  note for Client "_track()\nOperation codes have different procedues, which will\nchange the way we have to handle data. Most\noperation codes will be for send events."
  note for Client "_dispatch()\nDispatching is when we have a receive event intended\nto represent an action within the app. An\n example of this is message creation."
  note for Client "_heartbeat()\nHeartbeats are what is needed to keep a connection\nalive. Most heartbeats usually timed around\n42-45 seconds."
  
  State *-- Client
  Client ..> Protocol
  Protocol --> "many" Client : Inherits
  
  class HEARTBEAT {
    +int heartbeat_interval
  }
  HEARTBEAT o-- Client
  HEARTBEAT o-- Protocol
  HEARTBEAT o-- State
  
  note for HEARTBEAT "A concurrently timed payload sent to keep a connection alive."
  
  class HELLO {
    +int heartbeat_interval
    HELLO helps determine the interval for heartbeats to be sent.
  }
  HEARTBEAT ..> HELLO
  HEARTBEAT .. State
  
  class IDENTIFY {
    +str token
    +dict properties
  }
  note for IDENTIFY "Used to authenticate with the Gateway who you are."
  
  class RESUME {
    +str session_id
    +int seq
  }
  note for RESUME "Used to resume a dead connection.\nIf INVALID_SESSION comes after this, a new connection\isrequired."
  IDENTIFY ..|> RESUME
  
  class INVALID_SESSION {
    The current session has been invalidated.
    Create a new connection from here.
  }
  
  class RECONNECT {
    +bool d
  }
  note for RECONNECT "Inner boolean data field is for whether we can reconnect\nor not. If true, send RESUME.\nIf not, start a new connection."
  RECONNECT --> INVALID_SESSION
  RECONNECT .. RESUME
  RECONNECT .. IDENTIFY
Loading

Implementation

For the purposes of making things easy for developers to understand, I will be implementing the Gateway client with the following in mind:

  • We will be using Python as our solution due to its simplicity and how easy it is to understand.
  • Some non-popular dependencies will be used to grossly simplify operations
    • trio - A concurrent and I/O friendly asynchronous runtime for Python
    • trio_websocket - A wsproto implementation of WebSockets for trio.
    • attrs - Dataclasses with more granularity and finer control.
    • cattrs - A de/serialiser suite package for mappings to attrs-styled objects, and vice versa.
  • Only the most important qualities of the Gateway client will be added.
    • There will be no code for send events such as VOICE_STATE_UPDATE and REQUEST_GUILD_MEMBERS.
    • The implementation will roughly follow the documentation with some changes that aren't fully covered by it.

So, where do we begin?

Setting up enumerators

First, we need to establish opcodes and intents. The full list of Intents can be quite exhaustive, so we'll just keep a few to make things easier to understand.

import enum

class OpCode(enum.IntEnum):
  DISPATCH = 0
  HEARTBEAT = 1
  IDENTIFY = 2
  RESUME = 6
  RECONNECT = 7
  INVALID_SESSION = 9
  HELLO = 10
  HEARTBEAT_ACK = 11

Our intents will be a list of integer flags, where bitwise calculation is performed by the client to determine what events the application's allowed access to. We'll explain more about why we need these later on.

class Intents(enum.IntFlag):
  GUILDS = 1 << 0
  GUILD_MESSAGES = 1 << 9
  GUILD_MESSAGE_TYPING = 1 << 11

Establishing payloads

With our enumerators made, we can now move forward with creating the payload model. We'll be using attrs for this, which will make our life easier with mapping to a dataclass:

import attrs
import typing

@attrs.define(kw_only=True)  # We want full mapping usage in case some fields are missing a value.
class Payload:
  op: OpCode
  d: typing.Any | None
  s: int | None
  t: str | None

  # We'll create a few property methods so it's easier to understand these abbreviations.

  @property
  def data(self) -> typing.Any | None:
    return self.d

  @property
  def sequence(self) -> int | None:
    return self.s

  @property
  def name(self) -> str | None:
    return self.t

Defining a basic protocol

Interfaces are really unnecesary for Python mainly because there's no real "interface" solution provided natively as a builtin, so instead, we're going to use the next best thing: protocols. These are part of the builtin typing dependency Python provides, and offers static duck typing. Consider the following where we want to define how methods will act within our solution:

import typing

class GatewayProto(typing.Protocol):
  # These are class attributes we're defining.
  _token: str
  intents: Intents
  _state: State
    
  def __init__(self: typing.Self, token: str, intents: Intents) -> None:
    ...

  async def __aenter__(self: typing.Self) -> typing.Self:
    ...

  async def __aexit__(self: typing.Self) -> Exception:
    ...

When we make a new class that will be our Gateway client, if we were to subclass GatewayProto, that class will now expect these methods to be written with the same argument signature, type return, call convention and naming strategy. Failure to do this will result in a raised exception from the protocol.

Building our client

We have a majority of the important things done. Now we can build our Gateway client solution. But in order to make things further easier for us to understand, we're going to create an abstract class called State that stores some metadata found from the Gateway pre/post-connection wise:

@attrs.define(slots=False)  # we don't want to make this read-only, we need to write to this later on.
class State:
  version: int = 10  # Version of the Gateway/WebSocket API
  encoding: str = "json"  # How we want data to be transported as.
  compress: str | None = None  # If we want to compress data for memory sizing, but JSON doesn't really need it.
  heartbeat_interval: float | None = None
  resume_gateway_url: str | None = None
  session_id: str | None = None
  seq: int | None = None

We will explain the other attributes in the State class while we slowly build the client. For now, let's go ahead and start with a very basic implementation of connecting.

import trio
import trio_websocket

class Gateway(GatewayProto):
  _tasks: trio.Nursery = None

  def __init__(self: typing.Self, token: str, intents: Intents) -> None:
    self._token = token
    self.intents = intents
    self._state = State()

  # This is how we define an asynchronous context manager.
  # The calling convention for this will be "async with," where our whole event loop will be
  # handled asynchronously. If an exception is thrown, the manager will quietly end with the
  # traceback returned.
  async def __aenter__(self: typing.Self) -> typing.Self:
    # A "nursery" is like a group of scheduled tasks in asyncio. Trio believes in
    # explicit definition of tasks - there is no hidden control flow happening in the background,
    # everything is defined and handled at the time of creation.
    self._tasks = trio.open_nursery()
    nursery = await self._tasks.__aenter__()  # we have to asynchronously enter our nursery.
    return self

  async def __aexit__(self: typing.Self, *exc) -> Exception:
    return await self._tasks.__aexit__(*exc)

This is not a lot of code, but it can appear a little bit complex for some people. Let's break this down method-by-method:

  1. We have __init__ to initialise our class and write some values to some attributes.
    • We want to store the token (imagine it like the private password of a bot) and intents needed to connect.
    • We initialise a State within our intiailiser so we can easily reference and write to it later.
    • We define _tasks as our own class attribute separate from the protocol to handle important methods to call at certain times.
  2. We define an asynchronous context manager, __aenter__ so we can concurrently handle an event loop-based connection that can quietly exit without necessarily throwing an exception during the runtime state.
    • open_nursery() produces a Nursery managed object, similar to asyncio's grouped tasks.
  3. We also define another async context manager, __aexit__ that handles proper closing of the manager in the event of an exception.

Creating the connection loop

With the context manager now in place, we can write the official connect method. We'll have to adjust a few areas, of course to make this work.

First, let's start with writing the protocol for how the method should act.

class GatewayProto(typing.Protocol):
  ...
  
  async def connect(self: typing.Self) -> None:
    ...

Next, let's add a new class attribute, _conn which will represent trio_websocket.WebSocketConnection. This will be a context manager of its own kind that we'll relay through a trio.Nursery when we aynschronously enter the Gateway client:

class Gateway(GatewayProto):
  _conn: trio_websocket.WebSocketConnection = None
  
  async def connect(self: typing.Self) -> None:
    # We need to "resolve" which URL we're using. The Gateway can provide us two in different cases:
    # - If we already had a connection, it'll supply us a region-based URL path to make connections quicker and
    #   more sane.
    # - If we're starting a new connection, we'll need to use the entry point URL.
    resolved_url: str = self._state.resume_gateway_url or "wss://gateway.discord.gg"
    
    # This has a lot of string formatting to apply query-string parameters into the URL path.
    # It's recommended for Gateway clients to specify which version of the API they'll be using.
    # For the sake of making this implementation simple, we'll be encoding through JSON.
    async with trio_websocket.open_websocket_url(
      f"{resolved_url}/?v={self._state.version}&encoding={self._state.encoding}"
      f"{'&compress=' + self._state.compress or ''}
    ) as self._conn:
      while not self._conn.closed:
        print(await self._conn.get_message())

Finally, we'll need to start a new task in our Nursery so it's called upon asynchronously entering the client:

class Gateway(GatewayProto):
  async def __aenter__(self: typing.Self) -> typing.Self:
    ...
    nursery.start_soon(self.connect)
    return self

After this, give it a run. If right, we should have a very brief connection. And for the short period it's alive, we should now be receiving a JSON-serialised string resembling a HELLO payload. When formatted, it should look something like this:

{
  "op": 10,
  "d": {
    "heartbeat_interval": 42.5
  }
}

Standardising data representation

Instead of receiving everything as a string and parsing it from there, we could take a few other routes. Let's consider the following:

  • We can use the builtin json dependency to transform a JSON-serialised string into a dict object.
    • One drawback is that with dict, we could lose access to methods we might want to define for it to help ourselves.
  • We can utilise the Payload dataclass created earlier to handle value representation.
    • We can define our own methods to help make it easier.

The answer is pretty clear: we're using the Payload dataclass. But to make Python easily transform a JSON-serialised string into an attrs-decorated class, we'll need to use another dependency, cattrs.

Let's create a new private method, _receive() that will handle doing this all for us:

from aiohttp import WSMessage  # we're only importing this for typing and nothing else.
import cattrs
import json

class Gateway(GatewayProto):
  ...
  
  async def _receive(self: typing.Self) -> Payload:
    # Payloads are sent as a form of a packet, and are not sent as frames.
    # Some closures may be treated as a frame, so we'll be using an exception catching block.
    try:
      # When we use get_message(), it'll act as a blocking call until a message has been received.
      # This is useful for concurrently firing this method only when data has arrived.
      packet: WSMessage = await self._conn.get_message()
      json: dict = json.loads(packet)
      return cattrs.structure(json, Payload)
      
    except trio_websocket.ConnectionClosed as exc:
      return exc

Now we can refactor our connect() method to give us a better structure of data. We can use the asdict() method from attrs to also easily convert it back into a dict object:

class Gateway(GatewayProto):
  ...

  async def connect(self: typing.Self) -> None:
    ...
    async with trio_websocket.open_websocket_url(...) as self._conn:
      while not self._conn.closed:
        payload: Payload = await self._receive()
        
        if payload:
          print(payload)

Tracking operation codes

The next major step towards making the Gateway client work is to track operation codes. Why? Well, we have some send events that we need to send at the right moments. Most importantly, HEARTBEAT and IDENTIFY are necessary to implement.

Let's start by creating a new ._track() method that handles all necessary operation codes and how we'll handle them. Some pseudomethods will be written which we'll later implement:

class Gateway(GatewayProto):
  ...
  
  # We'll add a new state boolean for when we can start sending heartbeats.
  # This is necessary since trio tasks in a Nursery start after declaration,
  # but we don't want to do it as soon as the context manager begins.
  _send_heartbeats: bool = False
  
  async def connect(self: typing.Self) -> None:
    ...
    trio_websocket.open_websocket_url(...) as self._conn:
      while not self._conn.closed:
        payload: Payload = await self._receive()
        
        if payload:
          self._track(payload)
          
  async def _track(self: typing.Self, packet: Payload) -> None:
    # Discord recommend for us to always update our sequence from our last
    # given payload. This is important for two reasons:
    #
    # 1. If we drop the connection and are able to resume it, the sequence helps
    #    us repeat with dispatching the last sent event we didn't receive.
    # 2. Having an accurate sequence helps the Gateway know which previous payloads
    #    have already been sent, keeping us up-to-date and accurate in the connection
    #    loop.
    self._state.seq = packet.sequence
    
    # Let's track each operation code and add matching cases, allowing us to break
    # up handling into different levels.
    match packet.op:
      # HELLO is the first event we'll ever receive from the Gateway.
      # This event is responsible for telling us our heartbeat_interval value, needed to
      # send accurate heartbeats.
      case OpCode.HELLO:
        # Now we need to check for two conditions:
        # - If we had an existing past connection, we'll try resuming it.
        # - If we have no past connection, we'll identify ourselves.
        if self._state.session_id:
          await self._resume()
        else:
          await self._identify()
        
        self._state.heartbeat_interval = packet.data["heartbeat_interval"] / 1000  # ms to seconds
        self._send_heartbeats = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment