Skip to content

Instantly share code, notes, and snippets.

@paf31
Last active April 14, 2021 18:42
Show Gist options
  • Star 113 You must be signed in to star a gist
  • Fork 7 You must be signed in to fork a gist
  • Save paf31/9c4d402d400d61a49656 to your computer and use it in GitHub Desktop.
Save paf31/9c4d402d400d61a49656 to your computer and use it in GitHub Desktop.
Reimplementing a NodeJS Service in Haskell

Introduction

At DICOM Grid, we recently made the decision to use Haskell for some of our newer projects, mostly small, independent web services. This isn't the first time I've had the opportunity to use Haskell at work - I had previously used Haskell to write tools to automate some processes like generation of documentation for TypeScript code - but this is the first time we will be deploying Haskell code into production.

Over the past few months, I have been working on two Haskell services:

  • A reimplementation of an existing socket.io service, previously written for NodeJS using TypeScript.
  • A new service, which would interact with third-party components using standard data formats from the medical industry.

I will write here mostly about the first project, since it is a self-contained project which provides a good example of the power of Haskell. Moreover, the process of converting from TypeScript to Haskell was interesting in its own right. However, there are some general lessons which I have learned over the course of both projects, which I would also like to write about.

The Project

The original socket.io service had simple requirements:

  • Receive requests from the browser to subscribe to one or more topics.
  • Check that the user has the correct permissions to access those topics.
  • Listen for events on that topic from a Redis PubSub queue.
  • Send messages to the client as they arrive.

The original version was implementated in a single .ts file:

  • Interfaces were used to define messages which would be sent to/from the client.
  • Enum types were used to enumerate the possible message types
  • The application was written in callback-passing style - actions like checking permissions and subscribing to channels all involved composing callbacks.
  • To avoid making one Redis connection per client, it was necessary to use psubscribe and to manage the relationship between topics and client connections in a ConnectionManager class.

The service worked well most of the time, but would fail intermittently for an unknown reason. It is important to note that I am not an expert when it comes to architecting or deploying Node services, so it is quite likely that the issue was due to my own inexperience. However, I am more interested in adding features than I am in debugging heisenbugs, and given that we had already made the decision to test out Haskell, I had a good candidate for a service to reimplement.

Getting Support for Haskell

Getting support from other developers on the team turned out not to be very difficult. We work in a variety of modern languages already, so members of the team tend to be curious when there is a new tool available. Also, we generally tend to work with one person taking the lead on any given project, with other team members helping out where appropriate, so it was not difficult to start a new project. Two other developers were interested in working on the project, and knew enough Haskell to work on specific subproblems while I fleshed out the main architecture (I am also interested in the approach of using domain-specific languages to define the separation of responsibility between more and less experienced developers).

My first Haskell project was a tool which I needed for my own work - a documentation generator - not a user facing feature, but definitely visible internally. Also, I had already made something of a case for strong types and a functional approach by rewriting a moderately-sized JavaScript client component in functional TypeScript and PureScript, so the team was already aware of some of the possible benefits. On the development team, we work remotely, but meet at least a couple of times a year to share ideas and discuss important topics. At the last developer meeting, I had the opportunity to give a presentation about Haskell, which was well-received and generated enough interest that we decided to reimplement the existing socket.io service in Haskell.

I don't know to what extent we will end up using Haskell at DICOM Grid - we believe in using the right tool for the right job, and for each case that may or may not be Haskell (of course, I have certain biases in this area). However, I think that Haskell has an excellent place in a "microservices" architecture, replacing individual service components where appropriate.

General Notes

  • The first thing which becomes immediately obvious, when reimplementing a project in Haskell, is the massive benefit of using a language with an expressive type system. Even seemingly simple features like sum types, or the ability to newtype strings, provide huge gains.
  • The TypeScript implementation of the socket.io server used a product-as-sum encoding, which resulted in poor error messages when a client sent an incorrect request. One of the first things I decided to do during the Haskell port was to restructure the type of client requests to use a simple sum type. This gave the advantage that I was able to give good error messages when parsing requests, but more importantly, my data represented the domain more closely (to borrow a phrase, we want to make illegal states unrepresentable).
  • Strong types also became useful when I needed to perform IO (read from Redis, write to a socket, log some event, etc.) The IO monad forced me to factor my code into side-effecting and pure components (preferring the latter as much as possible), which led to a more understandable code base overall. Also, it was no longer necessary to write my code in a callback-passing style, since Haskell's IO manager uses epoll under the hood.
  • One of my favorite new examples of the benefits of strong typing is the STM monad. In particular, I was able to use STM along with transactional channels to communicate between my Redis code and the socket.io code. In the end, this meant that I didn't even need to implement the equivalent of the old ConnectionManager class, because transactional channels provided the same functionality! I simply create a new broadcast channel with newBroadcastTChan, and then duplicate the channel for each connected client using dupTChan.
  • The service has been running successfully without interruption for a week in our UAT environment, and will be deployed to production soon.

On "Real World" Haskell

One of the things I found most interesting about the project was the distinction between writing the toy Haskell projects I had worked on in the past, and a "real world" Haskell project, involving a significant amount of IO. Even something relatively complicated like PureScript is essentially one large pure function with a command line user interface on top.

Until now, the only Haskell project I had worked on which interacted with the world in any real way was the tablestorage library for working with the Windows Azure Table Storage API. Certainly, the task of applying knowledge from purely functional programming to the world of IO is a challenge in itself, but building a Haskell project for production use comes with its own set of unique challenges:

  • How to handle real-world data?
  • How to handle failure gracefully?
  • How to provide insights into the behavior of your service at runtime?
  • How to design the code for consumption by other developers?
  • How to deploy to a production environment?

Despite these challenges, I can report that I feel much more confident in my ability to learn new Haskell libraries than in any other language. Over the course of these two projects, I have used more than 20 libraries for the first time. I put this improvement down to the expressiveness of the type language, and the ability to "follow the types" in order to learn a new set of functions. Certainly, there is a steep learning curve, but I find the benefits quickly outweigh the effort required.

Library Support

Generally, I have found library support in Haskell to be excellent. There have been a few cases where I have found existing solutions lacking in some minor way, so one always has to be ready to roll up the sleeves and submit a pull request where necessary, but my impression has been that for most every-day programming tasks, there is some library on Hackage which solves the problem elegantly.

This was probably best illustrated by the fact that I was presented with a choice of not one, but two implementions of the socket.io protocol on Hackage. In the end, I decided to use Oliver Charles' excellent socket-io library, which I was able to use out of the box.

I'll say a little bit about each of the libraries which I have come to regard as indispensable for real-world Haskell programming:

Diagnostics

These two libraries are very useful for getting insight into the behavior of a running service:

  • hslogger is a logging library with a simple API and multiple backends. It is possible to filter out low priority log messages if the service is healthy, or log everything if you are trying to diagnose an issue.
  • ekg provides a web server which serves remote monitoring data over HTTP. It is also possible to define custom counters and gauges which can be displayed in the web UI.

External Services

These libraries provide APIs for integrating with external services:

  • hedis provides a simple API for communicating with a Redis database.
  • amqp provides a simple API for reading and writing messages to/from a queue implementation supporting the AMQP protocol, such as RabbitMQ.

Other libraries such as postgresql-simple fall into this category, but I have not had a chance to use them yet.

HTTP Clients

I tried out other HTTP client libraries before deciding to use http-streams. I needed a combination of features, including support for SSL connections, chunked request and response bodies and multipart requests. I decided to use http-streams because it has a very simple, intuitive API, and because it was the only library available which supported my exact use case out of the box. That said, I also had a very pleasant experience with http-client and http-conduit.

Data Formats

Over the course of the two projects, I had plenty of data formats to deal with, both standard and custom, binary and text. These libraries provide the means to consume and produce data in a variety of data formats. There are other options available, but I found them to be a very good fit for my use cases:

  • I typically use parsec to parse structured textual data, including document templates and configuration files.
  • binary is a library for efficient binary serialization. I use it to define serialization and deserialization code for custom binary file formats.
  • xml is a library for working with XML. I use it in conjunction with a modified version of the text-xml-qq library for lightweight templating of large XML documents.
  • aeson provides the ability to work with JSON documents. It is fast and flexible with a simple API.

Testing

test-framework provides a uniform interface to several types of tests, such as HUnit test cases and QuickCheck properties, allowing them to be grouped into test groups. In addition, it provides a clean command-line user interface for running tests, which can be run using cabal test.

Web Frameworks

scotty is a web framework written in Haskell, which can be used with WAI. While it isn't necessarily the most powerful web framework available (see also Yesod, Snap, Happstack), it provides a very straightforward API for defining RESTful services, which allowed me to quickly get to productivity.

Template Haskell

Template Haskell seems to be one of those tools which is best used sparingly. When overused, it can result in a situation where some of the benefits of Haskell that I have mentioned (namely programming by "following the types") become less useful, since we end up programming in a custom non-Haskell domain-specific language. However, when used well, I think it can be a powerful tool.

During these projects, I found what I thought was a particularly neat application of Template Haskell: using the text-xml-qq library as a lightweight XML templating library to generate large XML documents. Correctly applied, TH can be a great tool for reducing boilerplate code while maintaining type safety, and therefore improving productivity.

Deploying Haskell

One of the most interesting hurdles during these projects was the problem of deploying Haskell code into our production environments. I would be interested to hear any ideas in this area.

My current approach is to simply build a statically-linked binary in a Cabal sandbox, and then to deploy that binary to our servers. However, this approach has some problems:

  • Statically-linked binaries are large and take time to transfer, making the testing cycle inefficient.
  • Different operating systems and versions require recompilation.

Continuous integration will probably solve these issues, but I would also be interested to try out different approaches such as Docker or NixOS. Again, suggestions are welcome.

Conclusion

Using Haskell for real-world work has, for the most part, been a thoroughly enjoyable experience. I would recommend trying to replace a small, independent, non-critical service with a Haskell implementation. If you do not currently enjoy a service-based architecture, maybe try using Haskell to implement some of your tools, or to automate some process which you perform regularly as a part of your work. If nothing else, I have found Haskell to be a great way to test new ideas in my projects.

@ixmatus
Copy link

ixmatus commented Oct 15, 2014

Good write up.

We use Snap and enjoy it greatly - Snaplets have been an intuitive and well-thought through abstraction. Snaps entire design is rock solid, plus the work they're doing on io-streams in the Snap server is exciting.

Deployment

We use CircleCI very heavily and I cannot recommend them enough. Configurable in every way imaginable, they use docker containers to do the builds so they're able to parallelize the builds.

NOTE: they support Haskell incredibly well. Better than TravisCI.

I run the builds for master but have deploys (which CircleCI also handles) done only on tagged branches. This is easy, once the build finishes then I have a Fabric script in the directory that bundles all of the build artifacts up and pushes it to the list of servers and restarts the service managed by SupervisorD.

I also have the versioned artifact archive stored in S3.

This works really well because I've broken out our persistent models into a totally separate package that also tests its migration plan inside of the CircleCI container (you can load up Postgres / MySQL / Mongo, etc...)

You can add keys in CircleCI for the Github repos it needs so it can check those out during the build and add them to the dependency plan with cabal sandbox add-source ./the/models/repo.

It's very elegant and wonderfully suited to Haskell.

Structure

The microservice paradigm fits well with Haskell's philosophy of composition. Treating each program as an "OSS project" and giving it its own repo, cabal file, README, changelog, etc... will really help with maintenance, longevity, isolated testing, etc...

EKG

I like EKG too and I wrote ekg-log so that instead of making it available through HTTP (which is a bit heavy dependency wise) it pushes it out as JSON to a log file and only writes when the file is truncated, which works well for, say, a Server Density plugin which reads the file then truncates it when it's done.

Failure

Coming from Erlang land I've generally adopted the fail fast philosophy. So error conditions that are acceptable (stuff that comes through Either types, typically) get logged as you're doing with hslogger. Anything that ends the world, does. I allow it crash the whole thing but I use supervisord to maintain restarting the service if it crashes (with a limit of like 100 or something).

This has worked well for two separate problem domains - one in which we were consuming soft-real time energy metrics data from devices we installed in people's homes (ENORMOUS amounts of data + scrubbing then shipping it off to a DB) and another in which we are controlling devices inside the home through an AMQP interface and an HTTP interface.

Fail fast is a great strategy here, and the good news with Haskell is that if you've done a good job of designing first with types, most "end the world" failure cases will be very visible and should be rare, so you can remedy them or fix them effectively.

YAY!

So happy to hear someone else moved towards Haskell. I've done three separate platform re-writes for three different companies now, one from Scala to Haskell and the other two from Python to Haskell. In each instance everything in Haskell was not only possible but has made my life better and the software much better.

@carlpaten
Copy link

co-dh - every benchmark I've seen showed idiomatic Node.js and Haskell to have similar performance (5-8x slower than native).

I haven't seen any benchmarks comparing the two languages' memory usage, but intuitively I would expect Haskell to use more, unless you have a thorough understanding of the lazy execution model.

@silky
Copy link

silky commented Oct 15, 2014

for building/ci stuff, there's shake - https://github.com/ndmitchell/shake - and the currently-being developed bake - https://github.com/ndmitchell/bake - that you may like to consider.

@dmjio
Copy link

dmjio commented Oct 15, 2014

+1 for snap and http-streams

@purcell
Copy link

purcell commented Oct 15, 2014

Thanks so much for taking the time to do this write-up.

@laser
Copy link

laser commented Oct 16, 2014

Excellent post. Will most definitely serve as a template for my next web + Haskell project.

@inf0rmer
Copy link

Awesome, very useful write-up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment