Skip to content

Instantly share code, notes, and snippets.

@cablehead
Last active March 1, 2024 15:11
Show Gist options
  • Save cablehead/efbe67fb90b8dde21a7c885f91f3f75d to your computer and use it in GitHub Desktop.
Save cablehead/efbe67fb90b8dde21a7c885f91f3f75d to your computer and use it in GitHub Desktop.
small tools everywhere

What would it look like if we just used small tools, everywhere?

Original revision: Sep 6, 2018

Most developers are familiar and proponents of the Unix Philosophy Unix philosophy - Wikipedia particularly, Write programs that do one thing and do it well. In practice though, the tooling just doesn’t exist to build useful network services which follow this approach.

Let’s take a lightweight WebSocket service. In 2018 we’ve no shortage of languages and frameworks - however largely incompatible with each other, to create the service.

I’m most familiar with the Python world so can break out the different frameworks in that world that you could use: twisted, eventlet, gevent, tornado, asyncio, sanic - and even though these use the same base language, using libraries designed to be used with one of these frameworks would likely be difficult to use with another framework. And then there are also a myriad of options with Java, Golang, Erlang, Rust.

I think it’s telling when an interesting innovation happens in one of these ecosystems, people who prefer a different one begin porting (or requesting ports).

I’ve been messing with https://pptr.dev, a Node.js library to easily automate Chrome a bunch recently (it’s great). It’d be nice to process data scraped with Puppeteer seamlessly using libraries I’m familiar with in Python. As near as I can tell Puppeteer was first made publicly available around Aug 18, 2017. Come Aug 28, 2017 Are there any plans to port puppeteer to Python? · Issue #575 · GoogleChrome/puppeteer · GitHub

There is now what looks to be a useful port GitHub - miyakogi/pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)

I run (badly) a few open source projects and I feel exhausted just thinking about the on-going effort that’ll be needed to maintain this unofficial port.

Boot strapping an entirely new method of development, a new language, a new approach on concurrency is even more daunting. Unless you are able to attract sufficient volunteers to flesh out your ecosystem with the essential batteries included, realistically even if your approach has significant novel advantages it won’t be usable for real work.

A quick brain dump of batteries an ecosystem could really use:

  • Protocols: json / msgpack / thrift / grpc
  • Ability to read / write document formats: cvs, xls, pdf
  • Network: TCP, HTTP, HTTP/2, WebSockets
  • Bindings for AWS, Kafka, Redis, MySql, Sqlite, Mongo
  • Rich date handling
  • DNS resolution
  • Sane primitives to coordinate async
  • Template rendering
  • Package management
  • Cryptography, tls, ssh
  • Science and math libraries
  • Heck: even just slugify-ing a url using industry best practices

Pony Lang is a new language that has a lot of interesting qualities. The project list batteries required under reasons not to use it yet Discover - Pony

So what would it look like if we constructed our systems with small tools that can communicate easily with each other?

The first thing, I think(?) is that this is largely not possible currently. The suite of small tools needed don’t exist.

This is a shot at a HTTP Server that takes a JSON payload with two keys, a and b and returns their sum.

$ s6-tcpserver 127.0.0.1 8080 sh -c '
	http2json | \
	jq .body | jq "{\"res\": .a + .b}" \
	json2http'

$ jo a=3 b=4 | curl -d @- localhost:8080
{"res": 7}

Some more thoughts looking at this snippet:

  • Bash quoting is prohibitive to building complex system on the command line.
  • s6 use of “Bernstein chaining” Chain loading - Wikipedia has a lot of advantages but isn’t as natural

TCP socket server, binds to a port, spawns a process for each connection, a maps the connection’s socket’s read to the processes stdin and the processes stdout to the socket’s write.

In this case a small shell script is spawned that has an imaginary binary http2json that parses HTTP requests from it’s stdin and translate it to a JSON document, perhaps in the form:

{
	"method": "POST",
	"path": "/",
	"headers": {...},
	"body": "{\"a\":3,\"b\":4}"}

This is then piped to an instance of jq to extract the body of the request, and then a second version of jq which parses the body as JSON, and sums fields a and b and finally pipes this result to an imaginary binary that would take the JSON payload and turn it into a HTTP response.

  • Constantly serializing / deserializing data to pass between tools is an issue:
    • obviously, it's really inefficient
    • it kinda defeats the purpose of small tools, as each tool would also need to bundle it's own serializer / deserializer
    • this is a bigger deal than it may seem on the surface. one of the symptoms of monolithic ecosystems that need to provide batteries included is they often provide low quality solutions. e.g. streaming json parsing vs, read everything into memory as a string and then json decode
    • alternatives? shared memory? environment variables?
    • it'd be great to get Jeremy's input
@cablehead
Copy link
Author

batteries-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment