Skip to content

Instantly share code, notes, and snippets.

@rumblefrog
Last active December 14, 2020 02:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rumblefrog/d636813413f4d8eb2a8a13c68f7f5675 to your computer and use it in GitHub Desktop.
Save rumblefrog/d636813413f4d8eb2a8a13c68f7f5675 to your computer and use it in GitHub Desktop.

Abstract

Client CIDR filter system with polling centralized TCP polling server.

Motivation

  • Current relational structure dependence of CIDR Blocker requires full-row scan and value operation in order to determine potential candidate
  • Relational structure stores have O(n) storage instead and compute complexity.
  • Radix tree cannot be sanely implemented in such relational dependency, much less providing continuous updates.
  • Radix tree provides O(log n) storage and a worst-case of O(4) access, with the worst storage space cost of 256^4 bytes with every IPv4 address within the filter, with a mean cost of 256^3 bytes.

Guide-level explanation

The client is an opaque entity within this system; as long as the protocol remains consistent, which it must, a client can be any entity. However, in consideration of some of these opaque clients, it should attempt to minimize the number of dependencies.

The client will attempt to connect to the server and maintain the connection for X interval, in which every request renews it. The client-side will consist of soft termination with a shorter interval, with server hard terminating after 1.5*X interval.

Server in its regard operates independently, with non-volatile persistent being saved and loaded from. With the mean storage cost in mind, the server will maintain a volatile copy in memory to allow fast and repeated lookups.

The server may optionally require authorization payload for a higher rate of requests, with allowance for burst requests without authorization. This authorization may in the form of the payload or peer address.

Reference-level explanation

With minimal implementation in mind, the protocol will be in the form of a binary wire structure. With fields being considered for posterity purposes.

Note that the structure is packed in order of priority to avoid complete read when not necessary.

Each payload will consist of a header that identifies the intention and directionality.

ProtocolVersion = 1 Byte

Request (0)

  • IP Address - Desired address to check against the filter
  • Meta - Additional information that the server may use to improve or retarget; this is more vitally used on self-hosted instances.
IP Address - 32 Bits
Meta - Variable length, \0 terminated

Response (1)

  • InFilter - Boolean indicating the existence of address in the filter
  • Limit - Remaining limit for current interval
  • IP Address - Address in which the filter result is for
InFilter = 1 Bit
Limit = 7 Bits
IP Address - 32 Bits

The primary audience of this system is the SP client, although the protocol is non-preferential.

The SP client should only consist of Socket as its only extension dependency.

When the client establishes a global connection, it will remain established for X interval unless renewed by another request. This allows LRU connections to be terminated, freeing file descriptors, as a single listening port can accept more than one connection simultaneously.

After X interval, the client should close its TCP connection to the server and reopen upon new requests. These new requests in the SP client would be on OnClientConnect or any other client-based forwards.

Server upon startup will load a netset of address from a persistent source. This persistence may be implemented via the AddressSource trait and selectable from the server configuration.

The server will maintain a single global instance of a Radix tree, with netset being parsed and loaded into it. The server should maintain a RwLock on this global instance to allow a hot-reload of the list.

The server should maintain a persistent list of all connected clients and a sweeping function to purge expired clients. In addition, the server may choose to log the incoming requests for future analysis.

This Radix tree will utilize Huffman's coding as its branching condition. To assist in visualization, per each of the four maximum heights as IPv4 consists of four octets, there's a maximum width of 2^8-1 nodes per height. Performing a look upon this tree would consist of a maximum of 4 accesses.

Drawbacks

The usage of a fixed binary protocol may hinder future features and expansions. However, it eliminates much of the parsing and dependency overhead.

Rationale and alternatives

This system introduces the ability to centralize netset rules in optimal time and space, with the ability for other clients to connect on a minimal binary protocol.

It is certainly possible to implement the Radix tree structure within MySQL relational structure, in which it would eliminate Socket dependency and use the pre-bundled database driver; it is rather a trivial and non-ideal approach.

Prior art

  • CIDR Blocker operates on a MySQL database of CIDR entries, which performs full-row scan and operation.
  • Source Chat Relay contains an example of binary protocol and socket handling.

Unresolved questions

How to introduce automatic and rolling netset updates to the in-memory Radix tree? Perhaps it may be introduced by crontab sending a signal to the process to initiate the reading of the persistent file.

Future possibilities

The current design of the system is to maintain a single aggregated netset; a possible mechanism may be introduced to allow the selection of filters to check against. However, with the primary audience being the SM ecosystem, this remains a very distant scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment