Skip to content

Instantly share code, notes, and snippets.

@brugeman
Created January 13, 2023 21:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brugeman/d5da41ea2c1d98766f6b0a61154e1369 to your computer and use it in GitHub Desktop.
Save brugeman/d5da41ea2c1d98766f6b0a61154e1369 to your computer and use it in GitHub Desktop.
keyword search NIP:
- problems
- various languages, including ones w/o spaces btw words
- various kinds of search - CS/CI, prefix matching, etc
- various data formats - plain text, json, html, etc
- so, the meaning of 'search by keywords' at least depends on event kind
- so, keyword search query must include a kind filter, and if keyword search is not supported for at least one requested kind by relay then it replies with a EOSE and a NOTICE of "search:unsupported_kind"
- and, 'keyword' itself is only meaningful in the context of specific kind, here only kind:1 and kind:0 keyword search is defined
- new filter field "keywords" is added to the REQ message from clients
- filters with "keywords" field must also include "kinds" field, at least one of the mentioned kinds must be supported by relay, otherwise relay returns NOTICE w/ "search:unsupported_kind"
- "keywords" contains an array of strings - keywords, if any keyword matches then filter matches
- meaning of keyword is kind-depended, below is definition for kind:1 and kind:0, future NIPs may define it for other kinds
- keyword is a string, consisting of lowercase words (separated by spaces in most languages), all words must match for keyword to match, punct/ctrl/etc chars are ignored
- keyword search is case insensitive
- kind:1 events match a keyword if their content field contains all words from the keyword
- kind:0 events match a keyword if their content.about or content.name or content.display_name json fields contain all words from the keyword
- exact implementation details are not defined for relays, since they can't be trusted anyway
- clients should query several relays supporting this NIP and specific kind and dedup the results (thus achieve high recall)
- clients should verify that events returned by a relay actually match the keywords, and may stop querying relays that have low precision (return too much irrelevant stuff)
- what is not supported
- stop words - all other filters 'include' stuff, stop words exclude - let's not introduce them
- "exact match" - can be done on the clients
- OR/AND query modifiers not needed - specify several keywords for OR, specify several words in a keyword for AND
- relays should return results ordered by tm desc as per NIP-01, no other ranking schemes are required for kind:1 and kind:0
- future extension for kind:1 and kind:0
- words in a keyword can have a lowercase prefix "option:" followed by a string ending at next space char, describing an additional option to basic keyword search (like option:no_spam), relays should recognize this prefix and ignore an option word if it's not supported
- no option words are defined atm
@monlovesmango
Copy link

hey there! just wanted to leave a comment, though we talked about it a bit on nostr. overall i think the parameters you've laid out are solid, but I just don't see a change like this solving the problems stated. It is only able to satisfy a very particular use case and is pretty inflexible.

@brugeman
Copy link
Author

Hi! I think it lays a framework about how to solve these problems with future extensions: we add "keywords" field and specify how it should behave in general, and we specify the concrete details for kind:0 and kind:1 for 'public social network' use case. Further nips may define the "keywords" behavior for other kinds to suit the other use cases.

@monlovesmango
Copy link

I do agree this is useful, but I am not sure it is worth the overhead on the relay side (again bc I haven't worked much with the relays). I believe that the original design of nostr filters specifically wanted to avoid content search and only have search by tags.

I don't feel strongly enough about this to propose it as an official NIP, however I am not against this either as long as it is established that this will be the full extent of relay smart features.

the more "smart" non required features of relays there are, the more people are incentivized to use those smart relays which can lead to centralization imo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment