Skip to content

Instantly share code, notes, and snippets.

@jasny
Last active July 6, 2021 22:23
Show Gist options
  • Save jasny/2205604 to your computer and use it in GitHub Desktop.
Save jasny/2205604 to your computer and use it in GitHub Desktop.
TinyTextIndex

Tiny Text Index

Lightweight text indexer for PHP

Uses the dba extension (with db4)


Storing

Hello my name is Arnold and I'm not crazy. Arnold's kids are crazy though.

Cast string to lowercase, remove all stop words and words < 3 characters from the string.

hello my name is arnold and i'm not crazy. arnold~~'s~~ kids are crazy though.

Split up the string in words and pull the array through array_unique. Put all words in the index with a key consisting of group:id.

word:hello   ["mygroup:1"]
word:name    ["mygroup:1"]
word:arnold  ["mygroup:1"]
word:crazy   ["mygroup:1"]
word:kids    ["somegroup:7", "mygroup:1"]

Put all words in the index under the key.

item:mygroup:1  ["hello", "name", "arnold", "crazy", "kids"]

Searching

My crazy kids

Cast string to lowercase, remove all stop words and words < 3 characters from the string.

my crazy kids

Split up the string in words and pull the array through array_unique. Lookup each word.

word:crazy   ["mygroup:1", "mygroup:22", "somegroup:99"]
word:kids    ["somegroup:7", "mygroup:1"]

Use array_intersect to get the items with all words matching.


Updating

Hello my name is Arnold and I'm not smart. Arnold's kids are very smart though.

Cast string to lowercase, remove all stop words and words < 3 characters from the string.

hello my name is arnold and i'm not smart. arnold~~'s~~ kids are very smart though.

Split up the string in words and pull the array through array_unique. Get the old word for group:id.

Use array_difference(old, new) to get the words that are removed. Remove the key from the array and remove the words key from the index completely if the word is not present for any other items.

word:crazy ["mygroup:1"]

Use array_difference(new, old) to get the words that are new and add them from the index.

word:very   ["mygroup:9", "mygroup:1"]
word:smart  ["mygroup:1"]

Update item:group:id

item:mygroup:1  ["hello", "name", "arnold", "smart", "kids", "very"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment