Skip to content

Instantly share code, notes, and snippets.

@antirez

antirez/LPOS.md Secret

Last active June 23, 2020 23:36
Show Gist options
  • Save antirez/3591c5096bc79cad8b5a992e08304f48 to your computer and use it in GitHub Desktop.
Save antirez/3591c5096bc79cad8b5a992e08304f48 to your computer and use it in GitHub Desktop.

Background: this command was proposed as LRANK in PR 7379. I (antirez) modified the implementation to provide the matchpos and ALL option, and renamed the command into LPOS. This Gist was written as a draft of the command documentation, in order to get feedbacks from the community of Redis users about this command. A note about time complexity. This command is O(N) like a few Redis commadns operating on Redis lists. Normally we don't add commands that have worse than logarithmic complexity in Redis without thinking twice, however there are a few reasons why I'm positive about LPOS:

  1. List operations are often used against very small lists, making them effectively O(1), because certain applications always have a maximum size.
  2. It is a fundamental operation. When it's needed, people will resort to Lua scripting which is an order of magnitude slower than a direct implementation.
  3. The new development branch that shows threaded slow operations, not blocking the server when O(N) commands are called, is very promising. This may be one of the commands implemented in that way.

The implementation of LPOS can be found here: https://github.com/antirez/redis/commits/lpos

If you have feedbacks, please add a comment to this Gist or into the original issue.

LPOS key element [FIRST rank] [COUNT num-matches] [MAXLEN len]

The command returns the index of matching elements inside a Redis list. By default, when no options are given, the command will scan the list from head to tail, looking for the first match of "element". If the element is found, its index (the position) is returned, otherwise if no match is found, NULL is returned.

> RPUSH mylist a b c 1 2 3 c c
> LPOS mylist c
2

The optional arguments and options are able to modify the command behavior. The FIRST option specifies the "rank" of the first element to return, in case there are multiple matches. A rank of 1 means to return the first match, 2 to return the second match, and so forth.

For instance in the above example the element "c" is present multiple times, if I want the index of the second match, I'll write:

> LPOS mylist c FIRST 2
6

That is, the second occurrence of "c" is at position 6. A negative "rank" as the FIRST argument tells LPOS to invert the search direction, starting from the tail to the head. However the indexes are normally still reported in the "natural" way, that is, considering the first element starting from the head of the list at index 0, the next element at index 1, and so forth.

So, we want to say, give me the first element starting from the tail of the list:

> LPOS mylist c -1
7

Sometimes we want to return not just the Nth matching element, but the position of all the first N matching elements. This can be achieved using the COUNT option.

> LPOS mylist c COUNT 2
[2,6]

We can combine COUNT and FIRST, so that COUNT will try to return up to the specified number of matches, but starting from the first one specified.

> LPOS mylist c FIRST -1 COUNT 2
[7,6]

When COUNT is used, if the number of matches may be 0, to just tell the command we want all the matches found returned as an array of indexes.

> LPOS mylist COUNT 0
[2,6,7]

When COUNT is used and no match is found, an empty array is returned. However when COUNT is not used and there are no matches, the command returns NULL.

Finally the MAXLEN option, tells the command to compare the provided element only with the given number of list items. So for instance specifying MAXLEN 1000 will make sure that the command performs only 1000 comparison, effectively running the algorithm on a subset of the list. This is useful to limit the maximum complexity of the command.

@mgravell
Copy link

mgravell commented Jun 10, 2020

if I want the index of the second match, I'll write:

LPOS mylist c 2

thoughts: redis uses zero-indexing in many other places - LINDEX, ZRANGE, etc; as such, I would expect LPOS mylist c 0 to return the index of the first match, and LPOS mylist c 2 to return the index of the third match

@mgravell
Copy link

mgravell commented Jun 10, 2020

Rather than ALL, I wonder if:

LPOS key element [matchpos] [COUNT n]

would be clearer and more versatile, where N is asserted as positive (I guess you could no-op an empty array for 0, though)

So if you want the first 3 indexes, you'd use LPOS mylist c COUNT 3 - and if you want the "last 3 except for the final one": LPOS mylist c -2 COUNT 3 - so MATCHPOS defines ONLY where to start, and the optional COUNT defines how many to yield.

This could perhaps also be expressed as:

LPOS key element [matchpos [count]]

so "first 3" is LPOS mylist c 0 3 (no COUNT keyword usage)


When ALL is used and no match is found, an empty array is returned.

Also: needs to be defined: what is the result of the array variant when the key does not exist; null or an empty array?

@mgravell
Copy link

crazy tangential thought for uber-lists (with the O(N) problem): LSCAN key cursor pattern [COUNT count] - not suggesting you should do this, note - just thinking aloud; note that MATCH here is assumed, since you'd use LRANGE for an unrestricted query.

@gsuberland
Copy link

I like Mark's suggested changes. The usage is nice and clear, it doesn't need superfluous keywords, and it's fairly obvious how it works. The only additional potential use-case I could see is perhaps providing the list index to start at when performing the search (not just the match index), but that runs the risk of bloating the syntax somewhat.

An additional thing that is worth defining in the documentation is the order of indices returned when looking for the first/last N matches. I see from the command output that it is intended to be in incrementing order regardless, but it's probably good practice to state it explicitly and not expect users to infer it from the examples.

@bpo
Copy link

bpo commented Jun 10, 2020

Hi there, this looks great. I have seen inefficient workarounds with Lua and client-side scripting.

The command might be more broadly reusable if you allowed the client to specify multiple elements:

> RPUSH mylist a b c 1 2 3 c c
> LPOS mylist b c ALL
[b,1] [c,2,6,7]

This would allow me to easily do two extra things:

  • Detect adjacent matches or otherwise useful ordinal relationships
  • Find all items in a special "allow" or "deny" set in one pass.

@mike-plivo
Copy link

Regarding the response returned by the LPOS command, making it a different type (a single value vs a list) based on the arguments you are passing to the command may lead to some confusion. Usually, developers are not using directly raw Redis commands but libraries like redis-py (Python) or goredis (Golang) for examples. In those libraries, maintainers would need some extra logic around the number of arguments and return different data types based on the arguments. Strongly typed language implementations (like Golang, C, C++, Java) may be challenging. I think regardless of the arguments and the number of values returned in the response, we should always return a list.

@antirez
Copy link
Author

antirez commented Jun 11, 2020

I definitely like @mgravell proposal better than the current implementation, from the POV of zeor-based indexes and the COUNT argument. I'm not sure about the matchpos: it is not really indexing elmeents in a collection, and I don't see it very intuitive that -1 and 1 means a different thing (start from the second element, start from the first). However Marc COUNT option itself, provides a way to fix this! We just need to rename the matchpos argument into skip. That is, the number of matches we want to skip. When skip is negative, we just skip the same number of matches but we start from the other side.

@antirez
Copy link
Author

antirez commented Jun 11, 2020

Another thing that I'm going to add, is a MAXLEN argument, that states that only the first N elements of the list will be tested. This way even with huge lists, we have a way to limit the command execution time.

@antirez
Copy link
Author

antirez commented Jun 11, 2020

Mmm... nope :( "skip" also does not work since we can't write -0, but maybe there are clear alternatives to start from 1 / -1.

@antirez
Copy link
Author

antirez commented Jun 11, 2020

Ok I think I found a good approach, that uses COUNT, converts all the non mandatory arguments to options, and has the 1/-1 symmetry. I updated the implementation and now I'm updating this gist to reflect it.

@CaddyDz
Copy link

CaddyDz commented Jun 11, 2020

This looks like a public RFC!

@antirez
Copy link
Author

antirez commented Jun 11, 2020

Hello @mike-plivo, I'm all for clients that don't mask much Redis internals, but here the solution for clients that want to be higher level, is to just always pass the COUNT option, and the return value will be always the array.

@mgravell
Copy link

With the current edits, there's a slight incongruity because you can't do anything useful with MAXLEN in the case of a miss. You can't check the next chunk, for example.

Purely for discussion, but how about:

LPOS key element [START start] [SKIP skip] [COUNT count] [MAXLEN MAXLEN]`

Where start is an index (0 for first, -1 for end, negative triggers backwards crawl), SKIP is the number of matches to bypass non-negative, etc. This resolves the previous nuances and means that I can do:

LPOS ... MAXLEN 5000
...
LPOS ... START 5000 ... MAXLEN 5000
...

Etc to look iteratively through an uber-list.

@mgravell
Copy link

Also: does COUNT -1 have the semantics that ALL used to?

@antirez
Copy link
Author

antirez commented Jun 11, 2020

@mgravell: Yep we still have "ALL" with COUNT 0 (not -1), since to specify an empty count looks very odd and is unlikely that this is programatically done. However now that you mention this, I would like to check in other Redis commands with a COUNT of 0 does. I remeber that in certain instances it returns all the elements, but in others may not be like that. Better to check and see if -1 is more appropriate. Thaniks.

About MAXLEN, I avoided the START option because the MAXLEN main goal is to reduce the time complexity of the command and avoid it to be unbound. So if you have a large list, but you believe you are interested only in "near" matches, you can do that. START would in some way defeat the original goal of making LPOS "less O(N)".

Thanks!

@antirez
Copy link
Author

antirez commented Jun 11, 2020

XRANGE count 0 means all the elements. In other instances seems like that COUNT <= 0 is not allowed at all.
Looks like COUNT of 0 makes sense as a way to tell the implementation to return all the elements after all.

@mgravell
Copy link

👍

@swayamraina
Copy link

swayamraina commented Jun 11, 2020

IMHO, COUNT = 0 should mean return no match and while COUNT < 0 is illegal. And COUNT = n means return n matches.
Not specifying the COUNT should mean user wants all matches (Currently, not giving COUNT means default value of 1)

regarding @mgravell suggestion on excluding the COUNT keyword, the approach hides the meaning of the request while adding the keyword makes the command verbose plus gives a separator to the command which can be used for future modifications in the command and reordering of options.

Though agreeing to @antirez, the MAXLEN option should be added as part of building a guardrail for abuse.

@antirez
Copy link
Author

antirez commented Jun 11, 2020

@swayamraina what you suggest is not possible becuase COUNT is what triggers the array reply.

@swayamraina
Copy link

Taking this into note that presence of COUNT triggers the array reply, using * or all seems a good choice then.

Also, it seems LPOS will be used for list search of a single element. Will multi-value search make sense in near future or such complex search will have to be done via lua?
What i mean is most of the time one would want to check if certain set of values are present in the list or not i.e. if list contains any or all of the provided values in list.

@antirez
Copy link
Author

antirez commented Jun 12, 2020

Looks strange that the majority of use cases are about multi-value searches. But it will be possible to extend the command later if really needed. IMHO it's better to have a very simple usage for the base case. Then to be more complex only if needed, like with a special option to specify N patterns. We did this for MIGRATE with the "KEYS" option for instance.

@antirez
Copy link
Author

antirez commented Jun 12, 2020

Also note that COUNT 0 meaning "return all the items" is already used in Redis. For instance for the XRANGE command.

@andrewsensus
Copy link

@antirez I think this example didn't get converted from when [FIRST rank] was [matchpos]. This also exists in the documentation here https://redis.io/commands/lpos

- > LPOS mylist c -1
+ > LPOS mylist c FIRST -1
7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment