Skip to content

Instantly share code, notes, and snippets.

@andreivasiliu
Last active June 14, 2020 11:16
Show Gist options
  • Save andreivasiliu/c6291c97730394403bc355546c7d2918 to your computer and use it in GitHub Desktop.
Save andreivasiliu/c6291c97730394403bc355546c7d2918 to your computer and use it in GitHub Desktop.
mudproxy

I want hot-reloading.

In the past I achieved this by self-replacing the process via execve, and letting the new process inherit all the socket file descriptors. This worked on both Linux and Windows, but it's hard to implement.

So I'll do this intead:

  • A thin connection manager whose job is to listen on all the ports, and relay messages
  • The flow: input-socket -> subprocess stdin -> subprocess stdout -> output-socket
  • It can spawn and manage a subprocess. If the subprocess quits, it will restart it.
  • Maybe in the future it can do so for multiple processes.

However, it needs to handle multiple input/output sockets, and listen on multiple ports. It also should accept listen requests from the subprocess, and if a new subprocess is started, it needs to tell it about all the currently active connections.

The communication protocol should be something like:

r__1: There is no newline after this.
rnf1: Here it ends. It has a newline too.
c__0: accept 1234 1

And the subprocess writes:

c__0: listen 1234
snf1: say Hello
snf2: There is no newline after this. Here it ends. It has a newline too.

s/r = send/receive, n = newline, f = flush, 0/1/2 = connection ID.

Full list of possible messages:

  • Proxy:
    • Read bytes from socket N
    • Accepted connection N on port N
    • Closed connection N
  • Manipulator:
    • Listen on addr:port
    • Connect to addr:port
    • Set character(s) used to split messages
    • Send bytes to socket N
    • Send bytes to sockets tagged S

I think tagged connections might be really good. Also, the previous 6-character prefix could never hold more than 10 connections. So, revised protocol:

s 1 n: Stuff
s srv n: Stuff
r srv 1 n: Stuff
i: accepted clt 2
c: listen clt 0.0.0.0 1903
c: connect srv 1.2.3.4 23
c: close srv

r will show both the connection ID and its tag, but s only needs to specify one of them; when the tag is used, it broadcasts to all connections. Similarly, close can be used to close all connections with a certain tag.

Okay, that (old-thoughts.md below in this gist) didn't work... I stumbled upon a gem of an idea, but then got stuck on the inability to find a good algorithm that is able to do it. It's probably still possible, but for now it might be better to find an alternative.

Maybe to list the requirements, and find a sub-set that has no implementation issues.

Here are the cool ideas I found:

  • Multiple senders, their text is automatically tagged
  • Multiple receivers, their text is filtered by tags
  • System events (connected/accepted/disconnected) are represented as tagged text
  • Let modules mark regions of text with any tag
  • Let other modules react on the tag events of other modules
  • One common buffer to hold all text, one common vector (or vecdeque) to hold tag events

Why is this great? Because it allows it to be used as either a forward and a reverse proxy. Used alongside stdin/stdout, it can easily transform anything into a MUD client or server, and abstract all the networking away.

Back to the beginning though, besides tagging this is also a paragraph rewriting helper; tags are cool because they can detect various things (lines, utf8 lines, prompts, titles, exits, descriptions, player names) that other modules can then act upon.

So let's focus on the "act upon" bit:

  • Insert text at the beginning/end of other text
  • Replace text with other text
  • Create new lines that are or are not suppressed when the line they're based on disappears
  • Insert text between characters (e.g. colors)

The colors bit is odd; feels like it doesn't belong. There's something more though, also related to colors, but also telnet and MUD specific "invisible text" protocols like IAC GA or IAC SB ... IAC SE:

  • Let modules process an uncolored, normalized version of the text, but still able to access color info and metadata from invisible stuff

Suddenly everything is complicated. However, the last requirement also does something completely new; it starts going into the "parsing" of the text, rather than just the "modification" of it. Perhaps they should be kept separate? A fully parsed paragraph feels easier to manuever if it'd be a structure of things. As opposed to tags, which are singular marked zones of text, a fully parsed paragraph structure gives the ability to look at multiple things at once.

Okay, looks like I don't know what it is exactly that I need. Time for some user-stories:

  • Mapper:
    • looks at an entire paragraph, emits title, exits, description, players, objects
    • to do that, it uses color information, and pattern matching over all the lines
    • for example: room title has color, there are exits below it
  • Prompt:
    • it's on a single line, ended by the IAC-GA invisible thing

Design

Network

With futures

  • Future code owns state
    • Alternatively, if possible, store a future stream and the Runtime it's registered on somewhere inside a Connections object, and block only until the next future completes
  • state.process_chunk_of_data(chunk, &mut outworld)
  • outworld contains:
    • send_to_server(server_id, ...)
    • send_to_client(client_id, ...)

With select

  • A Connections object with .get_event() and various methods to connect/disconnect/add_timer
  • Can use mio

The happening

These need to happen:

  • disconnect client connections
  • connect to server
  • disconnect server
  • send to client
  • send to server

These too:

  • client accepted
  • server connected
  • client data
  • server data

Components

  • Network code (futures or mio)
    • API: get_event, listen, connect, disconnect, add_timer
  • MCCP gzipper/ungzipper
  • Chunkers
    • Parses a growing BytesMut buffer, knows when to generate an initial chunk
    • Returns Some(a_chunk_ends_here) offset
    • Can detect: line endings, paragraphs (IAC-GA), telnet IAC-SB .. IAC-SE sub-sections?, GMCP sub-sections
  • Session store
    • Each session holds: the Everything Stream, module state, server/client connection filter rules
  • Everything Stream logic
  • mudproxy markup listeners
    • Network API call listener
      • Calls things on the connections objects
      • Changes/configures chunkers
    • tmux-like session API call listener
      • (Re-)Associates clients with servers
        • Each server has its own separate Everything Stream
        • Re-associating moves a client out of a stream and onto another
      • Change filter rules
  • Data formatter/interpolator
    • Filter rule storage and logic
  • modules
    • Made up of mostly markup listeners (event handlers)
      • The module must provide a callback that looks like: handle_markup_event(tag_name, data_slice, output_stream)
      • The module can append bytes onto the output_stream freely, but not recommended
      • There will be a helper library that modules can use to generate and write strings onto the output_stream
        • E.g. connect(SocketAddr) that generates > connect addr port | "mudproxy-command" "sent-by: module"
    • Maybe can also contain custom chunkers?

The Everything Stream

Data is pre-processed using separate logic into frames (default to line-based for client and paragraph-based for server).

Whenever a frame is found, the Everything Stream is cleared, and the frame (chunk) is copied (RIP zero-copy) onto the beginning of the Everything Stream, and a new set of event processing rounds begins.

All data frames (from server, from clients), all events (network accept/connect/disconnect, timers), and all commands (connect, disconnect, add timer), are appended as pure text to an "Everything" stream. Every data that is added must also be fully marked with at least one tag when it is added. A markup event is generated with a reference to the slice of data and the tag name.

Modules are given read-only access to the existing stream, and append-only access to the end of the stream (but what is written cannot yet be read).

Modules (including mudproxy itself) will then loop through all newly added markup events, and generate new markup events on subslices of existing slices, or generate new blocks of data.

If new markup events were generated, the appended data becomes readable, and another round of looping through all new added markup events begins.

Once a round passes with no new markup events, mudproxy reads the whole stream, generates output for each socket based on filter rules (e.g. server gets everything tagged as "command", client1 gets everything tagged as "paragraph" or "line" except for stuff tagged as "sent-by: client1" or "suppress", etc.). To support inserting text in the middle of other text, mudproxy will interpolate data based on some special markup tags.

The buffer is then cleared entirely, and the process restarts when a Future comes up with a new chunk of data.


======
Room Title
You are inside of a well-decorated room description.
There are exits: north, west.
Someone is here.
prompt> 
------
block "paragraph" 0..133
block "line" 0..10
block "line" 10..57
block "line" 57..110
block "line" 110..143
block "prompt" 143..159
block "people" 110..143
block "person" 110..117
======
tell someone hello
------
block "command" 450..480
block "sent-by: healer" 450..480
======
[mudproxy] client "blah" accepted
p> 
------
block "mudproxy-event" 610..654
block "mudproxy-network" 610..654
block "mudproxy-prompt" 654-656

Also alias:

> tell someone hello | "command" "sent-by: healer"

to:

======
tell someone hello
------
block "command" x..y
block "sent-by: healer" x..y

Markup events (block ...) don't neessarily need to live in the same buffer as data chunks, but if it does, then it can reuse its memory capacity to store the tag names which are dynamically-sized (i.e. would need to be pre-allocated or fixed-size if a queue of events is used).

The stream buffer is large and resizable, but it cannot be resized while it has references to it (realloc() may move the memory); unsure how to handle this. Perhaps make a separate buffer when a capacity is reached, then resize and copy when everybody is done with the first buffer.

mudproxy needs to know how to skip data chunks, so instead of ====== it'll have to be something like ====== 154 bytes; data will be forcefully terminated by a newline if it doesn't already end in one, but the initial markup will not include the artificial newline.

Example API call from a module to mudproxy:

> connect 127.0.0.1 1523 | "mudproxy-command"

Modules will see "mudproxy-command" and choose to not do anything with it.

On the next round, mudproxy's own listeners will see "mudproxy-command", parse the first word, and add "mudproxy-network-command" on a subset of the above (for just "127.0.0.1 1523"):

block "mudproxy-network-command" x..y

On the next round, mudproxy's own listeners will see "mudproxy-network-command", parse it as an address and port, and create/spawn a future that will connect there.

Some cool things that this system allows:

  • Can use filter rules to make events and API calls visible to a client
  • Can save the whole stream to a file for logging/debugging of pretty much everything
  • Allows modules to generate and markup fake lines/paragraphs that will then be handled by other modules in the same way as normal ones
  • Allows modules to generate lines/paragraphs that are not marked as "line" or "paragraph", so only interested modules will handle them
  • Allows modules to change filter rules to include or exclude lines/paragraphs from a specific client (i.e. disable mapper room decorations even if they're still being generated)
  • Allows filter rules to be client-specific (e.g. a client will see everything, another client will only see chat channels, another client will only see maps, another client); this can make a MUD be multi-window even with just telnet/putty.
  • Adding scripting language interfaces will be trivial (although each language will need some parsing/utility functions to be implemented)
  • Might allow modules over the network/subprocess (although on every round everyone must be able to say they're done, and there might be 5-15-100 rounds per chunk, so this might not be feasible due to round-trip latency)
  • Allows clients to write API calls by hand (e.g. a module which for every "command" markup event that looks like "mudproxy cmd" will mark "cmd" as "mudproxy-command")
  • Allows clients to see the response of API calls as events; allows modules to react on any API calls, or on responses of API calls made by other modules

Sessions

A session is a completely separate set of connections (server+clients), state, and has its own "everything stream". For example, mudproxy might be connected to two servers, each in its own session, and the client would be able to switch between them in a tmux-like fashion.

New clients are assigned to a null-session (one that only understands "connect somewhere" which creates a new session), or perhaps one is created and used by default per listening port.

This allows:

  • Markup tag name reuse between servers without resorting to namespaces
  • Modules to keep state separate between servers

A session is not necessarily limited to one server; if there are more servers, they can be multiplexed to the client, or handled separately by some modules.

Attaching a client to the session or connecting to a server requires specifying the default chunker (frame codec) and default tag(s), specified in configuration or on the connect string (connect 127.0.0.1 1523 chunker:iac-ga tag:paragraph).

Text manipulation

I need to be able to:

  • Insert things
  • Suppress things
  • Replace (suppress+insert) things
  • Handle color codes sanely
  • Compose the above across modules in the most sane way

Some ideas:

  • A completely different event type besides "block", like "insert" and "replace"?
  • Or reuse "block", but add another "insert point" parameter at the end: `block "insert" 10..20 5
  • Or the reverse, bring something here: `block "replace" 5..5 10..20
    • This means "replace" can do all of insert/supress/replace
  • Could also be an omnipresent property of the block itself:
    • ====== bytes:32 insert-at:2
    • After all, everything is inserted, by default, at the end

Need a sane process order. What happens if for the text "This is a sample text" I:

  1. Replace "a sample" with "an example"
  2. Insert "simple" inside "a sample"
  3. Insert "complex" inside "an example"
  4. Supress "This is a sample text"

Least surprising would be to get an empty text, regardless of the order. Maybe I need operator precedence? However, if "suppress" is "replace 10..20 with 70..70" (where 70..70 is an empty block marked with the correct tag) then the rest of the rules should be ignored by nature of never getting to them.

Text manipulation boundaries

There's an issue with prefix/inline/suffix messages; for example:

This is a line[ but a module added this suffix]\n

When suppressing the whole line, the suffix disappears naturally by nature of being inside the text (thanks to the \n).

But for this:

[This is a prefix added by a module\n]
This is a line\n
[This is a suffix added by a module\n]

...what happens when suppressing or replacing "This is a line\n"? Should the prefix/suffix be suppressed as well?

Also, for empty blocks: [This is a prefix][This is a suffix] - since the coordinates of an empty block are n..n, how do I specify whether it's a prefix or suffix? Is it useful to specify that?

Some ideas:

  • The "\n" helped, so perhaps add special unprintable markers at the beginning and end of every block?
    • E.g. "This is a line.\n" becomes "$This is a line.\n$", where '$' is a special character that is never printed
  • Maybe whenever inserting, specify location affinity
    • Could be left-side, center, right-side
    • Or could be inside, outside
      • Unable to differentiate prefixes/suffixes like this
  • Could make paragraphs and lines generate multiple blocks and events
    • E.g. 1. Empty block, 2. paragraph, 3. Empty block
    • Would only solve the issue for paragraphs and lines, not for the general case
      • Unsure if it's even useful in the general case

Filter rules

There are three ways to approach this:

  • Add auto-tags (e.g. everything marked as both "a" and "b" but not "c" will be auto-marked as "d")
    • If "not" is allowed, this might become very unstable (and probably turing complete, which is not the point)
    • Since markup is not hierarchical, this will likely be very hard to implement
  • Maintain a filter rule set for all connections
    • Given a set of active markup tags, which connections should we send the next region of text to?
    • How long until the next region changes?
    • Can use bitflags to make things faster
  • No support for "deny", but make supressing/replacing things be somehow tag-specific
    • In other words, a rule is a simple "Allow" of a single tag and that's it
    • Insertions would be ignored by virtue of their content not having the proper tag
    • Supression/replace, however, is difficult
      • Maybe I could enable it if at least a part of it contains the tag

Rule-set of Allow/Deny

# All allowable tags are put in here; a bitflag set is created for them
let set = TagSet::new_from(&["paragraph", "line", "send-to-client", "prompt"]);
# Everything's connected with 'AND' operators; simple one-tag auto-tags can be used to emulate 'OR'.
# For example, a listener tags every "paragraph" as "send-to-client" and every "line" as "send-to-client"
let rule = FilterRule::new_from(&[Allow("send-to-client"), Deny("suppress"));
set.add_tag("custom");
set.enter("paragraph");
set.leave("line");
set.matches_filter_rule(rule);

..no, wrong approach. Need a rule set to know whether to care about a specific tag.

Rule-set of single Allow

End result should be an iterator that gives something like:

  1. Block("This is a simple text.", replace_start, replace_end, block_address_start)
  2. Block(" And this has been appended to it.", ...) // replace_start is end of previous block
  3. Block("gracefully ", ...) // replace_start is inside of previous block
  4. Block("added", ...) // replace_start/end will wrap "appended"

If replace-start == replace-end, then it's a simple insertion. If not, but the block is empty, then it's a simple suppression. Otherwise, it's a full-blown supresssion. For this to work, it means that even empty blocks must be taggable, otherwise they won't show up in the iterator's output, therefore will not take effect (i.e. not replace/suppress anything).

The above will likely need to be collected and sorted somehow, and based on it, another iterator will compute the display order:

  1. Text("This is a simple text.")
  2. Text(" And this has been ")
  3. Text("gracefully ")
  4. Text("added")
  5. Text(" to it.")

Renderer/interpolator

Let's assume every block has information for where it is and how to be inserted or replace something inside another block:

0..0

=====
This is a block.
----- address:27..42 replace:0..0

Replacing "block" with "piece of text" in this block creates a new block:

======
piece of text
------ address:43..58 replace:36..41

Since a block may only have a single replace, and will replace something that's already defined (i.e. a part of another block above it), we get a tree of blocks.

For example, given these blocks:

  1. "This is a simple text."
  2. "s/simple/complex/"
  3. "s/is a/is supposed to be/"
  4. "s/to/to not/"
  5. "s/supposed/not supposed/"
  6. "s/This/This thing/"
  7. "s/text/string/"

Result: "This thing is not supposed to not be a complex string."

So to render it it should be ordered as: [1 > (6, [3 > (5, 4)], 2, 7)]

So all that's needed is to sort in-place by the expanded address, which can be done with no allocations, which will basically give the result of a DFS of the tree.

Mapped to replace_start, that's: [0.1, 1.1, 1.2, 3.1, 3.2, 1.3, 1.4] - which if sorted gives a BFS instead. Unsure how to use this yet, but looks promising.

Unsure if the DFS-like result can be reused for iteration; need to know when a list of children ends to go back up. Looks like a high replace-start followed by a lower replace-start can trigger a backtrack, but I need to confirm that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment