Skip to content

Instantly share code, notes, and snippets.

@ncalm
Created April 2, 2024 20:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ncalm/6a20d3345450044fd175089ee91e3b34 to your computer and use it in GitHub Desktop.
Save ncalm/6a20d3345450044fd175089ee91e3b34 to your computer and use it in GitHub Desktop.
This Excel LAMBDA function calculates the Sørensen–Dice coefficient between two text strings
/*
GETBIGRAMS - Get the bigrams of a text string
Inputs: 1. str - a text string
Returns: A vertical array of the bigrams of the text string
Example:
=GETBIGRAMS("banana")
={"ba";"an";"na";"a"}
*/
GETBIGRAMS =LAMBDA(str, UNIQUE(MID(str,SEQUENCE(LEN(str)),2)));
/*
DICECOEFF
Return the Sørensen–Dice coefficient between two text strings
https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
Inputs: 1. string1 - a text string to compare with string2
2. string2 - a text string to compare with string1
Returns: The Sørensen–Dice coefficient between the strings
Example:
=DICECOEFF("increasing", "decreasing")
=0.842
*/
DICECOEFF = LAMBDA(string1, string2,
LET(
str1grams, GETBIGRAMS(string1),
str2grams, GETBIGRAMS(string2),
intersection, FILTER(str1grams, ISNUMBER(XMATCH(str1grams, str2grams))),
sizesum, ROWS(str1grams) + ROWS(str2grams),
IF(sizesum>0,2 * IFERROR(ROWS(intersection),0) / sizesum,0)
)
);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment