Skip to content

Instantly share code, notes, and snippets.

@lenards
Last active October 26, 2017 17:44
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save lenards/1bc37bcaf42df3249b39 to your computer and use it in GitHub Desktop.
Save lenards/1bc37bcaf42df3249b39 to your computer and use it in GitHub Desktop.
Understanding the usage of `nodetool getendpoints` with compound partition keys

TL;DR - I think I have an answer that I'm sort of okay with ... and I might have decided that I need to write up a bug regarding a regression.

The whole thing has gotten a bit complicated - so you can skip it if you like...


If you consider nodetool getendpoints as a tool in our toolkit, we'd like know how to use it on realistic data. The examples in the exercises were just integer types, and not something a bit more complex.

So, here's what we need to give nodetool:

$ ./bin/nodetool -h ... getendpoints - Print the end points that owns the key ...

But how do pass <key> here when you have a partition key, like musicdb.album?

The DDL for album is:

CREATE TABLE album (
  title text,
  year int,
  genre text,
  performer text,
  tracks map<int, text>,
  PRIMARY KEY ((title, year))
)

That's the major spirit of the question.

How about we assume that we want the location for the replica set for the 1972 album released by Elvis Presley, "Elvis Sings Hits From His Movies, Volume 1".

How do I passed that to nodetool getendpoints?

I did some wild Googling, and I found a blog from Last Pickle about the PRIMARY KEY statement.

For a given table events

CREATE TABLE events (
  device_id   int,
  year_month  int,
  sequence    timestamp,
  pressure    int,
  temperature int,
  is_dam_dirty_apes  boolean,
  PRIMARY KEY ((device_id, year_month), sequence)
);

They peeked under the covers with cassandra-cli and showed the "RowKey" and then used that value in nodetool getendpoints as the <key> agrument:

[default@dev] list events;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: 2:201302
=> (column=2013-02-20 10\:58\:40+1300:, value=, timestamp=1357869160739000)
=> (column=2013-02-20 10\:58\:40+1300:is_dam_dirty_apes, value=01, timestamp=1357869160739000)
=> (column=2013-02-20 10\:58\:40+1300:pressure, value=000011d0, timestamp=1357869160739000)
=> (column=2013-02-20 10\:58\:40+1300:temperature, value=00000015, timestamp=1357869160739000)
-------------------
$ bin/nodetool -h 127.0.0.1 -p 7100 getendpoints dev events 2:201302
127.0.0.2

Our table album, the partition key is title,year.

CREATE TABLE album (
  title text,
  year int,
  genre text,
  performer text,
  tracks map<int, text>,
  PRIMARY KEY ((title, year))
)

If a title is simple, without spaces - it seems to work like this:

$ ./nodetool -p 7100 getendpoints musicdb album Pinkerton:1996
127.0.0.1

I know that "RowKey" is Pinkerton:1996 from cassandra-cli.

But ... what about spaces in our title?

Considering that I know the "RowKey" is: Elvis Sings Hits From His Movies, Volume 1:1972

This works in the lo-fi world Thrift (like you might expect):

[default@musicdb] get album['Elvis Sings Hits From His Movies, Volume 1:1972'];
=> (name=, value=, timestamp=1410900116143001)
=> (name=genre, value=526f636b, timestamp=1410900116143001)
=> (name=performer, value=456c76697320507265736c6579, timestamp=1410900116143001)
=> (name=tracks:00000001, value=446f776e20627920746865205269766572736964652f5768656e20746865205361696e747320476f206d61726368696e6720496e, timestamp=1410900116143001)
=> (name=tracks:00000002, value=546865792052656d696e64204d6520746f6f206d756368206f6620596f75, timestamp=1410900116143001)
=> (name=tracks:00000003, value=436f6e666964656e6365, timestamp=1410900116143001)
=> (name=tracks:00000004, value=4672616e6b696520616e64204a6f686e6e79, timestamp=1410900116143001)
=> (name=tracks:00000005, value=477569746172204d616e, timestamp=1410900116143001)
=> (name=tracks:00000006, value=4c6f6e672d4c6567676564204769726c, timestamp=1410900116143001)
=> (name=tracks:00000007, value=596f7520446f6e2774204b6e6f77204d65, timestamp=1410900116143001)
=> (name=tracks:00000008, value=486f7720576f756c6420596f75204c696b6520746f204265, timestamp=1410900116143001)
=> (name=tracks:00000009, value=42696720426f7373204d616e, timestamp=1410900116143001)
=> (name=tracks:0000000a, value=4f6c64204d6163446f6e616c64, timestamp=1410900116143001)
Returned 13 results.
Elapsed time: 2.01 msec(s).

HOWEVER ... if does not translate to the command-line with nodetool:

student@cascor:~/cassandra$ ./bin/nodetool -p 7100 getendpoints musicdb album "Elvis Sings Hits From His Movies, Volume 1:1972"
./bin/nodetool: 61: [: Elvis: unexpected operator
getendpoints requires ks, cf and key args

It seems that this was written up and fixed at one point:

https://issues.apache.org/jira/browse/CASSANDRA-4551

Again, like "Pinkerton", we can use a key if no spaces against (like albums_by_genre):

student@cascor:~/cassandra$ ./bin/nodetool -p 7100 getendpoints musicdb albums_by_genre "Punk"
127.0.0.1

But, anything with a space, we're DOOMED!

student@cascor:~/cassandra$ ./bin/nodetool -p 7100 getendpoints musicdb albums_by_genre "Middle Eastern"
./bin/nodetool: 61: [: Middle: unexpected operator
getendpoints requires ks, cf and key args

I did notice in CASSANDRA-4551 the help text for nodetool used to ask for the <key> in HEX format. And, today, when I saw the token() I thought I'd try that out:

cqlsh:musicdb> SELECT title, token(title, year)
               FROM album
               WHERE
                 title = 'Elvis Sings Hits From His Movies, Volume 1' AND
                 year = 1972;


 title                                      | token(title, year)
--------------------------------------------+---------------------
 Elvis Sings Hits From His Movies, Volume 1 | 9124020880974048405

(1 rows)

Which I think works!

$ ./bin/nodetool -p 7100 getendpoints musicdb album  9124020880974048405
127.0.0.1

Funny thing is, being a Bootcamp student, I am not 100% sure that this has identified the node. But, given that token() gives us, well, a token we can devine from nodetool ring if the value is owned by a node.

Putting my customer hat on for a moment (as an OPS/admin person) ... it seems like having to run a query to get a token to tell me the endpoint is a bit of run around (feels like M*A*S*H 4077 storyline to me!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment