Skip to content

Instantly share code, notes, and snippets.

@dustin

dustin/tap.org Secret

Created May 11, 2010 21:47
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dustin/23623b951fedc9a56d24 to your computer and use it in GitHub Desktop.
Save dustin/23623b951fedc9a56d24 to your computer and use it in GitHub Desktop.
tap protocol specs and what-not

Memcached Tap

1 Overview

Tap provides a mechanism to observe from the outside data changes going on within a memcached server.

2 Use Cases

Tap is a building block for lots of new types of things that would like to react to changes within a memcached server without having to actually modify memcached itself.

2.1 Replication

One simple use case is replication.

Upon initial connect, a client can ask for all existing data within a server as well as to be notified as values change.

We receive all data related to each item that’s being set, so just replaying that data on another node makes replication an easy exercise.

2.2 Observation

Requesting a tap stream of only future changes makes it very easy to see the types of things that are changing within your memcached instance.

2.3 Secondary Layer Cache Invalidation

If you have frontends that are performing their own cache, requesting a tap stream of future changes is useful for invalidating items stored within this cache.

Ideally, such a stream would not include the actual data that had changed. Today, all tap streams include full bodies, but specifying new features that can be implemented by engines such as requesting the omission of values is very straightforward.

2.4 External Indexing

A tap stream pointed at an index server (e.g. sphinx or solr) will send all data changes to the index allowing for an always-up-to-date full-text search index of your data.

2.5 vbucket transition

For the purposes of vbucket transfer between nodes, a new type of tap request can be created that is every item stored in a vbucket (or set of vbuckets) that is both existing and changing, but with the ability to terminate the stream and cut-over ownership of the vbucket once the last item is enqueued.

3 Protocol

A tap session begins by initiating a command from the client which tells the server what we’re interested in receiving and then the server begins sending client commands back across the connection until the connection is terminated.

3.1 Initial Base Message from Client

A tap stream begins with a binary protocol message with the ID of 0x40.

The packet’s key may specify a unique client identifier that can be used to allow reconnects (resumable at the server’s discretion).

A simple base message from a client referring to itself as “node1” would appear as follows.

Byte0123
00x800x400x000x05
40x000x000x000x00
80x000x000x000x05
120x000x000x000x00
160x000x000x000x00
200x000x000x000x00
240x6e (‘n’)0x6f (‘o’)0x64 (‘d’)0x65 (‘e’)
280x31 (‘1’)
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x00
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x00000005
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras              : None
Key          (24-29): The textual string: "node1"
Value               : None

3.1.1 Options

Additional tap options may be specified as a 32-bit flags specifying options. The flags will appear in the “extras” section of the request packet. If omitted, it is assumed that all flags are 0.

Options may or may not have values. For options that do, the values will appear in the body in the order they’re defined (LSB -> MSB).

3.1.1.1 Backfill

BACKFILL (0x01) contains a single 64-bit body that represents the oldest entry (from epoch) you’re interested in. Specifying a time in the future (for the server you are connecting to), will cause it to start streaming only current changes.

An example tap stream request that specifies a backfill of -1 (meaning future only) would look like this:

Byte0123
00x800x400x000x05
40x000x000x000x00
80x000x000x000x0c
120x000x000x000x00
160x000x000x000x00
200x000x000x000x00
240x000x000x000x00
280x000x000x000x01
320x6e (‘n’)0x6f (‘o’)0x64 (‘d’)0x65 (‘e’)
360x31 (‘1’)0x000x000x00
400x000xff0xff0xff
440xff
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x00000015
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000001
Key          (32-36): The textual string: "node1"
Value               : 0x00000000ffffffff

3.1.1.2 Dump

DUMP (0x02) contains no extra body and will cause the server to transmit only existing items and disconnect after all of the items have been transmitted.

An example tap stream request that specifies only dumping existing records would look like this:

Byte0123
00x800x400x000x05
40x000x000x000x00
80x000x000x000x0c
120x000x000x000x00
160x000x000x000x00
200x000x000x000x00
240x6e (‘n’)0x6f (‘o’)0x64 (‘d’)0x65 (‘e’)
280x31 (‘1’)0x000x000x00
320x000xff0xff0xff
360xff
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x0000000c
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000002
Key          (32-36): The textual string: "node1"
Value               : None

3.1.1.3 VBucket List

LIST_BUCKETS (0x04) is used to limit a request to a specific set of vbuckets.

The vbuckets are included as values of 16-bits each, starting with a 16-bit number indicating the number of vbuckets in the list.

An example tap stream request that specifies vbuckets 1, 2, and 5 would look like this:

Byte0123
00x800x400x000x05
40x000x000x000x00
80x000x000x000x0c
120x000x000x000x00
160x000x000x000x00
200x000x000x000x00
240x6e (‘n’)0x6f (‘o’)0x64 (‘d’)0x65 (‘e’)
280x31 (‘1’)0x000x000x00
320x000xff0xff0xff
360xff0x000x030x00
400x010x000x020x00
440x05
Field        (offset) (value)
Magic        (0)    : 0x80
Opcode       (1)    : 0x40
Key length   (2,3)  : 0x0005
Extra length (4)    : 0x08
Data type    (5)    : 0x00
Reserved     (6,7)  : 0x0000
Total body   (8-11) : 0x0000000c
Opaque       (12-15): 0x00000000
CAS          (16-23): 0x0000000000000000
Extras       (24-31): 0x0000000000000004
Key          (32-36): The textual string: "node1"
Value               : {3, 1, 2, 5} (16-bits each)

3.1.1.4 Takeover VBuckets

TAKEOVER_VBUCKETS (0x08) is used to indicate that the client wishes to completely take the given vbuckets away from the server.

3.1.1.5 Support ACK

SUPPORT_ACK (0x10) indicates the client supports explicit ACKing. See ACK support for more information on this mechanism.

3.1.1.6 Request Keys Only

REQUEST_KEYS_ONLY (0x20) does pretty much what you’d think. It requests that the server does not send the values along with the keys. The server is not required to understand this so the client shouldn’t assume it will be guaranteed to not receive values.

3.2 Response Commands

After initiating tap, a series of responses will begin streaming commands back to the caller. These commands are similar to, but not necessarily the same as existing commands.

In particular, each command includes a section of engine-specific data as well as a TTL to avoid replication loops.

3.2.1 General Response Flags

The flag section of the packet has two defined flags that may be present for any given packet:

3.2.1.1 ACK

0x01 indicates the packet requests an ACK. The client must reply to this packet (i.e. send a packet back upstream with the same opaque) to acknowledge receipt of this packet.

3.2.1.2 NO_VALUE

0x02 indicates the item may have a value, but it is not included in the request.

[describe extended formats]

3.2.2 Mutation

All mutation events arrive as TAP_MUTATION (0x41) events. These are conceptualy similar to set commands.

3.2.3 Delete

0x42

3.2.4 Flush

0x43

3.2.5 Opaque

0x44

Engine specific extensions are sent as tap opaque messages. A client may ignore any of these it doesn’t understand (though must still honor the ACK if it the message has one).

3.2.6 VBucket Set

0x45

When doing a takeover, this message indicates a vbucket state transition is ready.

4 ACK Support

In default mode, all events are streamed out of the server effectively blindly and as fast as possible. With ACKs, a client can more reliably receive messages by letting the server know at a higher protocol level that it has successfully processed tap messages.

TCP, of course, guarantees message delivery, but a message can be delivered to a remote system that can crash before processing it. With ACKs enabled (and if supported by the server), the server can retransmit any messages whose ACKs were pending at the time of a connection drop if the same named connection reattaches to a tap session that is still live.

The server chooses how frequently to send ACKs, and may dynamically adjust the interval at which it sends ACK requests to achieve maximum throughput. Clients need not make any assumptions about message ACKs other than any message requesting an ACK needs a response. 3

5 Implementations

5.1 Java

spymemcached supports a pretty rich API in tap in recent versions

5.2 Python

ep-engine has a python implementation that’s used in a lot of utilities

5.3 go

gotap is a working go implementation of tap, though has no real-world deployments.

5.4 C

Trond’s libcouchbase has support for tap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment