Skip to content

Instantly share code, notes, and snippets.

@halfelf
Created February 11, 2019 02:18
Show Gist options
  • Star 77 You must be signed in to star a gist
  • Fork 21 You must be signed in to fork a gist
  • Save halfelf/db1ae032dc34278968f8bf31ee999a25 to your computer and use it in GitHub Desktop.
Save halfelf/db1ae032dc34278968f8bf31ee999a25 to your computer and use it in GitHub Desktop.
How to Build a Fast Limit Order Book

https://web.archive.org/web/20110219163448/http://howtohft.wordpress.com/2011/02/15/how-to-build-a-fast-limit-order-book/

The response to my first few posts has been much larger than I’d imagined and I’d like to thank everyone for the encouragement.

If you’re interested in building a trading system I recommend first reading my previous post on general ideas to keep in mind.

My first really technical post will be on how to build a limit order book, probably the single most important component of a trading system. Because the data structure chosen to represent the limit order book will be the primary source of market information for trading models, it is important to make it both absolutely correct and extremely fast.

To give some idea of the data volumes, the Nasdaq TotalView ITCH feed, which is every event in every instrument traded on the Nasdaq, can have data rates of 20+ gigabytes/day with spikes of 3 megabytes/second or more. The individual messages average about 20 bytes each so this means handling 100,000-200,000 messages per second during high volume periods.

There are three main operations that a limit order book (LOB) has to implement: add, cancel, and execute. The goal is to implement these operations in O(1) time while making it possible for the trading model to efficiently ask questions like “what are the best bid and offer?”, “how much volume is there between prices A and B?” or “what is order X’s current position in the book?”

The vast majority of the activity in a book is usually made up of add and cancel operations as market makers jockey for position, with executions a distant third (in fact I would argue that the bulk of the useful information on many stocks, particularly in the morning, is in the pattern of adds and cancels, not executions, but that is a topic for another post). An add operation places an order at the end of a list of orders to be executed at a particular limit price, a cancel operation removes an order from anywhere in the book, and an execution removes an order from the inside of the book (the inside of the book is defined as the oldest buy order at the highest buying price and the oldest sell order at the lowest selling price). Each of these operations is keyed off an id number (Order.idNumber in the pseudo-code below), making a hash table a natural structure for tracking them.

Depending on the expected sparsity of the book (sparsity being the average distance in cents between limits that have volume, which is generally positively correlated with the instrument price), there are a number of slightly different implementations I’ve used. First it will help to define a few objects:

     Order
      int idNumber;
      bool buyOrSell;
      int shares;
      int limit;
      int entryTime;
      int eventTime;
      Order *nextOrder;
      Order *prevOrder;
      Limit *parentLimit;

    Limit  // representing a single limit price
      int limitPrice;
      int size;
      int totalVolume;
      Limit *parent;
      Limit *leftChild;
      Limit *rightChild;
      Order *headOrder;
      Order *tailOrder;

    Book
      Limit *buyTree;
      Limit *sellTree;
      Limit *lowestSell;
      Limit *highestBuy;

The idea is to have a binary tree of Limit objects sorted by limitPrice, each of which is itself a doubly linked list of Order objects. Each side of the book, the buy Limits and the sell Limits, should be in separate trees so that the inside of the book corresponds to the end and beginning of the buy Limit tree and sell Limit tree, respectively. Each order is also an entry in a map keyed off idNumber, and each Limit is also an entry in a map keyed off limitPrice.

With this structure you can easily implement these key operations with good performance:

Add – O(log M) for the first order at a limit, O(1) for all others
Cancel – O(1)
Execute – O(1)
GetVolumeAtLimit – O(1)
GetBestBid/Offer – O(1)

where M is the number of price Limits (generally << N the number of orders). Some strategy for keeping the limit tree balanced should be used because the nature of markets is such that orders will be being removed from one side of the tree as they’re being added to the other. Keep in mind, though, that it is important to be able to update Book.lowestSell/highestBuy in O(1) time when a limit is deleted (which is why each Limit has a Limit *parent) so that GetBestBid/Offer can remain O(1).

A variation on this structure is to store the Limits in a sparse array instead of a tree. This will give O(1) always for add operations, but at the cost of making deletion/execution of the last order at the inside limit O(M) as Book.lowestSell/highestBuy have to be updated (for a non-sparse book you will usually get much better than O(M) though). If you store the Limits in a sparse array and linked together in a list then adds become O(log M) again while deletes/executions stay O(1). These are all good implementations; which one is best depends mainly on the sparsity of the book.

Generally, it’s also wise to use batch allocations, or if using a garbage collecting language like Java, object pools for these entities. Java can be made fast enough for HFT as long as the garbage collector isn’t allowed to run.

Strategies for safely and robustly providing access to the book’s data from multiple threads will be the subject of another post.

To conclude, this is how I’ve learned to build a high performance limit order book. If anyone has questions about how specifically to implement some of the operations I talk about, please post in the comments.

@sjtindell
Copy link

Cool content. Where's the rest of your posts? Do you have any other resources I could look at to learn how to build a limit order book? What about a matching engine?

@halfelf
Copy link
Author

halfelf commented Feb 14, 2020

Cool content. Where's the rest of your posts? Do you have any other resources I could look at to learn how to build a limit order book? What about a matching engine?

Sorry this is not my post. It is old and from a deleted blog. I just copied the original text from webarchive.

@JVillella
Copy link

@BARJ
Copy link

BARJ commented Feb 28, 2023

Keep in mind, though, that it is important to be able to update Book.lowestSell/highestBuy in O(1) time when a limit is deleted (which is why each Limit has a Limit *parent) so that GetBestBid/Offer can remain O(1).

Can anyone explain to me how Limit *parent helps with finding the next best bid or offer in O(1) time?

I would use Limit *parent, together with Limit *leftChild and Limit *rightChild, to find the predecessor (the next best buy or sell), but that is done in only O(n) time.

@halfelf
Copy link
Author

halfelf commented Mar 1, 2023

@BARJ The best bid/ask must be a leaf node, which has no children in the binary tree. The second best price must be its parent.

@BARJ
Copy link

BARJ commented Mar 7, 2023

Screenshot from 2023-03-07 15-52-09

@halfelf, please correct me if I am wrong, but the best bid/ask can be an internal node (bid 6), where the second best price can be its child (bid 5).

@halfelf
Copy link
Author

halfelf commented Mar 7, 2023

@BARJ Sorry I haven't revisited this article for years and made a terrible mistake. Let me try to fix this.

We can alway find the best price in O(1) right? By storing a pointer to the smallest node and keep updating it whenever there is an insertion or deletion.

  • For inserting a new price node, it's easy to check if it is the new best.
  • For deleting/removing the best node, we will try to locate the successor one, it must be the parent node if the parent node has no right subtree. Or, the left most node of the right subtree (the sibling of the node which to be deleted).
    • For a balanced binary search tree, the right sibling tree's depth must be a limited number, so that the iteration complexity will be constant.

Think of C++'s std::map, maybe this is the most common (and slowest) structure that fullfills those performance criteria, implemented in RB-tree. std::map::begin() and end() have constant complexity. And its iterator's ++/-- operators have that too.

@tucq88
Copy link

tucq88 commented Feb 16, 2024

On creating orders with market price, What is your suggestion on data structure?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment