Skip to content

Instantly share code, notes, and snippets.

@melekes
Last active June 20, 2017 09:36
Show Gist options
  • Save melekes/131291e5144e31d34ac3ed3ab7ff2aca to your computer and use it in GitHub Desktop.
Save melekes/131291e5144e31d34ac3ed3ab7ff2aca to your computer and use it in GitHub Desktop.
Tags in DeliverTx response

Introduction

In the light client (or any other client), the user may want to subscribe to a subset of events (rather than all of them) using /subscribe?event=X. For example, I want to subscribe for all transactions associated with a particular account. Same for fetching. The user may want to fetch transactions based on some filter (rather than fetching all the blocks). For example, I want to get all transactions for a particular account in the last two weeks (tx's block time >= time.Now().Sub(2 * time.Week)).

The goal is a simple and easy to use API for doing that.

Tx Send Flow Diagram - https://www.dropbox.com/s/x45qaj5fq04qo1x/tags1.png?dl=0

Questions

1. Design question: in the above diagram, why the light client connects to the tendermint directly, but not through the ABCI app? @srm's current architecture looks more like https://www.dropbox.com/s/c4r99a7a5p1tpz9/tags2.png?dl=0 where a client is a part of an ABCI app (he is not using the light-client library). 2. Won't we end up in a place where both Tendermint and App have different indexers (Tendermint storing tx results, App storing domain-specific details)? If so, maybe we should let our users do the indexing stuff. Yes, it means in Basecoin we will have to use KV indexer or http://www.blevesearch.com/ or smth else to index accounts. The problem of the tags approach below (see Proposal) is that it doesn't allow complex types (http://www.blevesearch.com/docs/Getting%20Started/). What if the user wants to index some complex struct. How will we encode this and transfer to the Tendermint? go-wire? (means requiring custom encoding) 3. There is a question of who should manage tx indexing keys - app or tendermint. We've discussed it already. But my question is (maybe it is silly) - why we need a hash in the first place? Is the tuple {heigh, index} not enough? It uniquely identifies transaction.(?) So, instead of letting the app manage the keys or saying that your data should not be malleable, we could send an index field with a DeliverTx request and let the app do the indexing (it can add some domain specific details or smth else - we cannot predict really).

# {block height, tx index} => ...
{123, 10} => [{account_holder, "Joe"}, {account_desc, "Private account}]

Proposal

ABCI app return tags with a DeliverTx response inside the data field (for now, later we may create a separate field). Tags is a list of key-value pairs.

Example data:

{
  "channels": ["abci.account_owner.Igor", "abci.account_number.333222111"],
  "work": 10,
  "priority": 5,
  "account.owner": "Igor"
}

Tendermint will most likely have some reserved tags - e.g. "channels" (see below).

1. Subscribing to events

If the user wants to receive only a subset of events of type X, he/she must return channels tag with a DeliverTx response. For every channel in that list, Tendermint will notify the subscribers.

We will need to add an optional channels field:

/subscribe?event=X - events of type X
/subscribe?event=X&channels="abci.account_owner.Igor" - events of type X tagged `abci.account_owner.Igor`
/subscribe?channels="abci.account_owner.Igor"&channels="abci.account_owner.Ivan" - all events tagged `abci.account_owner.Igor` OR `abci.account_owner.Ivan`
/subscribe?channels="abci.account_owner.Igor" - all events tagged `abci.account_owner.Igor`
/subscribe?event=X&channel_patterns="abci.account_owner.*" - events of type X tagged `abci.account_owner.` (e.g. `abci.account_owner.Igor`, `abci.account_owner.Ivan`)
/subscribe?event=X&channel_patterns="abci.account_owner.*" - all events tagged `abci.account_owner.` (e.g. `abci.account_owner.Igor`, `abci.account_owner.Ivan`)

Frey suggested adding wildcard routes to allow clients to subscribe using regexps (e.g. abci.accounts.*). Do we need full regexp syntax or just a strict subset - *?[? Glob-style pattern matching should be enough (https://github.com/antirez/redis/blob/d680eb6dbdf2d2030cb96edfb089be1e2a775ac1/src/util.c#L46).

2. Fetching transactions

This is a bit tricky because a) we want to support a number of indexers, all of which have a different API b) we don't know whenever tags will be sufficient for the most apps (I guess we'll see). c) I am still not convinced this should be on the Tendermint side and not on the ABCI side.

Some indexers (Elasticsearch) require schema (they call it "mapping") to be able to index the data. Where this schema should be defined? And when?

Schema for the account field:

{
  "account" : {
    "properties" : {
      "owner" : {
        "type" : "string"
      },
      "ID" : {
        "type" : "integer"
      },
    }
  }
}

Data:

{
  "channels": ["abci.account_owner.Igor", "abci.account_number.333222111"],
  "work": 10,
  "priority": 5,
  "account": { 
    "owner": "Igor Black",
    "ID": 333222111
  }
}

Or I am digging too deep? What was the original plan? Tags to be a list of strings (["work:5","account_owner.Igor"]), which we index and allow for match queries?

Besides indexing, every indexer has its own query syntax. http://okfnlabs.org/blog/2013/07/01/elasticsearch-query-tutorial.html

Based on the feedback from our users (@srm), we can assume that queries could be arbitrary: tendermint/tendermint#525 (comment).

{
    "query": {
        "term" : { "account": { "owner": "Igor" } }
        "constant_score": {
            "filter": {
                "range": {
                    tx_commited_at: {
                        "from": "2017-01-01",
                        "to": "2017-05-01"
                    }
                }
            }
        }
    }
}

Notes:

  • Trusted vs Untrusted index, that we verify after
  • app duplicating hash implementation

Issues

@ebuchman
Copy link

/subscribe?event=X&channels="abci.account_owner.Igor" - events of type X tagged abci.account_owner.Igor

Originally I was thinking this is just for tx stuff, but I suppose it might be useful to have it on non-tx events too. I still think we need to do work on rpc authentication to limit how much load a client can put on a public server, or how many clients can even connect, and so on.

Do we need full regexp syntax or just a strict subset

Full regex probably overkill. Just * might even be sufficient for now.

I am still not convinced this should be on the Tendermint side and not on the ABCI side.

Still? Open to your thoughts here. Remember clients need to talk to Tendermint for proofs, unless we burden all app devs with exposing Tendermint proof stuff. What do you think is better ?

Or I am digging too deep? What was the original plan?

These are good questions. I'm not sure exactly. Maybe we're trying to do too much with all this super general purpose abstraction (arbitrary tags, arbitrary indexers). The original thinking was that tags would just be set of key-value pairs:

{
  "account.name": "Igor",
  "account.address": "0xdeadbeef",
  "priority": 7
}

I think we can avoid more complex data structure values for now with a convention over keys (ie. account.name and account.address instead of account being some struct).

The channels thing is interesting. But why not allow any tag to be a channel? Is it because it makes it more complicated to subscribe based on the value? Again we could have some standard, like subscribing to account.name=Igor means subscribing to account.name where value is Igor. I could l see folks wanting more complex things too, like priority>7 for txs with priority above 7.

So at what point would we just design to go with a particular indexing solution so we can support some of that sort of complexity, or is that too dangerous a rabbit hole ?

For now, the main priorities are too subscribe to particular classes of app events (I think the channels and * is pretty good for this) and to have some other tags to support mempool behaviour (work, priority). I don't think we need to worry about more datastructures yet. For instance, if we return account.name.Igor as a channel, don't think we need to return "account": {"name": "Igor"} as a tag too.

@ebuchman
Copy link

First review of the new pub sub looks really nice. Is it meant to replace the events package, or will it be a consumer ? Also, could we not extend it to support channels with a convention on the event names? ie. NewBlockEvent.proposer.deadbeef ?

Also realizing now that you can't even subscribe to "all txs" in Tendermint, you have to provide a hash. So we need to fix that to work with this, or just make this channel stuff specific to transactions any ways?

Curious your thoughts. We should talk more about this on Monday.

@srmo
Copy link

srmo commented Jun 16, 2017

you have to provide a hash

Isn't this one of the clues to "why do you need to reimplement the hash algo in the app" or "why do you need the hash in the app"?

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Open to your thoughts here. Remember clients need to talk to Tendermint for proofs, unless we burden all app devs with exposing Tendermint proof stuff. What do you think is better ?

Thought about it today. I think this point: having a very standard rpc interface means the light-client can work with all apps and handle proofs outweighs others.

I still think we need to do work on rpc authentication to limit how much load a client can put on a public server, or how many clients can even connect, and so on.

Yeah, we do. Have you thought about it? Will it be some sort of token-based authentication? (a) Client asks ABCI for token b) Client accesses TM API using it)

The original thinking was that tags would just be set of key-value pairs

Agree that we should not put too much stuff into Tendermint. Key-value pairs seem to be the easiest option where we will be able to query for a specific txs (tied to a particular account).

So at what point would we just design to go with a particular indexing solution so we can support some of that sort of complexity, or is that too dangerous a rabbit hole ?

The point where we will have to define a schema for the tags. With the simple key-value pairs we can just say channels should be an array of strings, work and priority - ints, others - strings.

{
  "account.name": "Igor",
  "account.address": "0xdeadbeef",
  "priority": 7
}

Otherwise, we have to define a schema somewhere OR pass it every time with the tags (WARNING: more space)

{
  {"key": "account.name", "_keyType": "string", "value": "Igor", "_valueType": "string"}
}

But why not allow any tag to be a channel? Is it because it makes it more complicated to subscribe based on the value? Again we could have some standard

Because it is magic? I don't like magic. There are two types of it:

  1. GOOD. The example could be rails new (http://guides.rubyonrails.org/command_line.html) - it just works. Most of the time you don't even need to look for details.
  2. BAD. This is where you need to read something in order to understand it (like "subscribing to account.name=Igor means subscribing to account.name where value is Igor")

I may be wrong here, but having the channels tag and channels field in /subscribe route seems much cleaner.

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Is it meant to replace the events package, or will it be a consumer ?

If we go with the channels approach, it should replace the EventSwitch in RPC.

Also, could we not extend it to support channels with a convention on the event names? ie. NewBlockEvent.proposer.deadbeef ?

But how can I subscribe to a several channels? or a list of patterns?

So we need to fix that to work with this, or just make this channel stuff specific to transactions any ways?

We can strip the tx hash from the event name and add it to channels:

             object -> type -> channels
Publish(Tx, "Tx", "AB0023433CF0334223212243BDD", deliverTx.channels...)

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Frey's comments:

If that isn't too much work for you, it would be helpful to allow the event system to restart easily after a disconnect (send me all such events in the last 100 blocks when I was disconnected).

I agree any more complex query logic should be in the app. Like multiple parameters.

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Q: should /subscribe?query (event notification API) and /txs/search?query (search API) have the same query format?

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Another took on API:

# event notification
/subscribe?event=Tx&query="hash = AB0023433CF0334223212243BDD AND account.invoice.number = 22"
/subscribe?event=Tx&query="hash = AB0023433CF0334223212243BDD AND account.invoice.owner LIKE Igo*"

# search
/txs/search?query="hash = AB0023433CF0334223212243BDD AND account.invoice.owner LIKE Igo*"
/txs/search?query="account.invoice.owner = Igor"

Pros:

  • same format for event notifications and search
  • flexible query

Cons:

@ethanfrey
Copy link

ethanfrey commented Jun 19, 2017

Sorry I didn't make your chat. And just saw this.

What are the types of queries to be allowed?

  • equals
  • prefix equals?
  • contains???
  • integer compares? (actually quite nice for block height > X)

Also what kind of boolean logic? Just "ands", right?

@ethanfrey
Copy link

If it is simpler, maybe no real need for a parser....

like ...?event=Tx&filter="account.invoice.owner=Igo*"&filter="account.invoice.number>22" ???

It is totally valid to have multiple arguments with the same name, and handled by many web frameworks. Then the parsing just has to look for [=<>]+ in order to separate out the key from the value.

Does that make sense? I agree a full-fledged query parser is overkill. Too much work. Too slow to run. We should provide a minimal set of useful commands that work well.

@ethanfrey
Copy link

The point where we will have to define a schema for the tags. With the simple key-value pairs we can just say channels should be an array of strings, work and priority - ints, others - strings.

{
"account.name": "Igor",
"account.address": "0xdeadbeef",
"priority": 7
}
Otherwise, we have to define a schema somewhere OR pass it every time with the tags (WARNING: more space)

{
{"key": "account.name", "_keyType": "string", "value": "Igor", "_valueType": "string"}
}

Yeah, and the schema stuff... knowing which fields are int and which are strings.... real pain.

Especially if it is not hard-coded anywhere, and then an app keeps writing int values in the tag, and suddenly a string and we are just kinda guessing the whole time?

@melekes
Copy link
Author

melekes commented Jun 19, 2017

Notes:

  • let’s go with the 3rd approach - https://gist.github.com/melekes/131291e5144e31d34ac3ed3ab7ff2aca#gistcomment-2127149 and not worry too much about performance
  • tags will be a list of KVPairs (protobuf encoded)
    message KVPair {
        string key = 1;
       optional string value_string = 2;
       optional int value_int = 3;
    }
    
  • disallow duplicate keys for now
  • the work should be done in a separate package
  • what we want is sql like engine (Postgres, SQLlite) (mostly for historic queries)
    | id | tx_result | tx_id | account_address | block height | block creation time |
    
  • new package should be abstract enough so people are free to choose other than default engine

TM config:

  • whitelist keys which can be used in a fetching query (?) see the table ^^
  • flag for enabling indexing (already exist)

Open issues:

  • there is an issue where there are too many txs

@melekes
Copy link
Author

melekes commented Jun 20, 2017

While designing the library, need to think about concurrency model. events v1 was sequential with a "NOTE: EventSwitch callbacks must be nonblocking" https://github.com/tendermint/tendermint/blob/f8035441951a8d1cbdbc97ba295ea3b85e3c1981/rpc/core/events.go#L12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment