cablehead/01-small-tools-everywhere.md

## 01-small-tools-everywhere.md

      
    Raw
  

              01-small-tools-everywhere.md
            
          
    What would it look like if we just used small tools, everywhere?

Original revision: Sep 6, 2018
Most developers are familiar and proponents of the Unix Philosophy Unix philosophy - Wikipedia particularly, Write programs that do one thing and do it well. In practice though, the tooling just doesn’t exist to build useful network services which follow this approach.
Let’s take a lightweight WebSocket service. In 2018 we’ve no shortage of languages and frameworks - however largely incompatible with each other, to create the service.
I’m most familiar with the Python world so can break out the different frameworks in that world that you could use: twisted, eventlet, gevent, tornado, asyncio, sanic - and even though these use the same base language, using libraries designed to be used with one of these frameworks would likely be difficult to use with another framework. And then there are also a myriad of options with Java, Golang, Erlang, Rust.
I think it’s telling when an interesting innovation happens in one of these ecosystems, people who prefer a different one begin porting (or requesting ports).
I’ve been messing with https://pptr.dev, a Node.js library to easily automate Chrome a bunch recently (it’s great). It’d be nice to process data scraped with Puppeteer seamlessly using libraries I’m familiar with in Python.  As near as I can tell Puppeteer was first made publicly available around Aug 18, 2017. Come Aug 28, 2017 Are there any plans to port puppeteer to Python? · Issue #575 · GoogleChrome/puppeteer · GitHub
There is now what looks to be a useful port GitHub - miyakogi/pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)
I run (badly) a few open source projects and I feel exhausted just thinking about the on-going effort that’ll be needed to maintain this unofficial port.
Boot strapping an entirely new method of development, a new language, a new approach on concurrency is even more daunting. Unless you are able to attract sufficient volunteers to flesh out your ecosystem with the essential batteries included, realistically even if your approach has significant novel advantages it won’t be usable for real work.
A quick brain dump of batteries an ecosystem could really use:

Protocols: json / msgpack / thrift / grpc
Ability to read / write document formats: cvs, xls, pdf
Network: TCP, HTTP, HTTP/2, WebSockets
Bindings for AWS, Kafka, Redis, MySql, Sqlite, Mongo
Rich date handling
DNS resolution
Sane primitives to coordinate async
Template rendering
Package management
Cryptography, tls, ssh
Science and math libraries
Heck: even just slugify-ing a url using industry best practices

Pony Lang is a new language that has a lot of interesting qualities. The project list batteries required under reasons not to use it yet Discover - Pony
So what would it look like if we constructed our systems with small tools that can communicate easily with each other?

The first thing, I think(?) is that this is largely not possible currently. The suite of small tools needed don’t exist.
This is a shot at a HTTP Server that takes a JSON payload with two keys, a and b and returns their sum.
$ s6-tcpserver 127.0.0.1 8080 sh -c '
	http2json | \
	jq .body | jq "{\"res\": .a + .b}" \
	json2http'

$ jo a=3 b=4 | curl -d @- localhost:8080
{"res": 7}

Some more thoughts looking at this snippet:

Bash quoting is prohibitive to building complex system on the command line.
s6 use of “Bernstein chaining” Chain loading - Wikipedia has a lot of advantages but isn’t as natural

TCP socket server, binds to a port, spawns a process for each connection, a maps the connection’s socket’s read to the processes stdin and the processes stdout to the socket’s write.
In this case a small shell script is spawned that has an imaginary binary http2json that parses HTTP requests from it’s stdin and translate it to a JSON document, perhaps in the form:
{
	"method": "POST",
	"path": "/",
	"headers": {...},
	"body": "{\"a\":3,\"b\":4}"}

This is then piped to an instance of jq to extract the body of the request, and then a second version of jq which parses the body as JSON, and sums fields a and b and finally pipes this result to an imaginary binary that would take the JSON payload and turn it into a HTTP response.

  
## 02-questions.md

      
    Raw
  

              02-questions.md
            
          
Constantly serializing / deserializing data to pass between tools is an issue:

obviously, it's really inefficient
it kinda defeats the purpose of small tools, as each tool would also need to bundle it's own serializer / deserializer
this is a bigger deal than it may seem on the surface. one of the symptoms of monolithic ecosystems that need to provide batteries included is they often provide low quality solutions. e.g. streaming json parsing vs, read everything into memory as a string and then json decode
alternatives? shared memory? environment variables?
it'd be great to get Jeremy's input