Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@benoror
Last active January 13, 2022 01:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save benoror/e78f6bdff1b02de7ceed28c9b08fe435 to your computer and use it in GitHub Desktop.
Save benoror/e78f6bdff1b02de7ceed28c9b08fe435 to your computer and use it in GitHub Desktop.
Research: PostgreSQL Fuzzy Search

Algorithms

Levenshtein (a.k.a. match difference)

Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

Soundex (a.k.a. match soundalikes)

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.

Double Metaphone (a.k.a. better soundex)

Metaphone is a phonetic algorithm... It fundamentally improves on the Soundex algorithm... which does a better job of matching words and names which sound similar.

Trigrams (match misspellings)

Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for doing statistical analysis of texts.

Resources

PgSearch

Fuzzy Search

fuzzystrmatch - Soundex / Metaphone

pgtrgm - Trigram

Indexes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment