Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A BigQuery function that accepts an array of similar strings and returns the string that occurs most within array.
CREATE TEMP FUNCTION GetMainStringValue(arr ARRAY<STRING>) RETURNS STRING AS (
ARRAY(
SELECT list FROM (
SELECT list,
count(*) count,
FROM UNNEST(arr) list
GROUP BY list
ORDER BY count DESC
)
)[SAFE_OFFSET(0)]
);
SELECT GetMainStringValue(["CoolWords", "CoolerWords ", "CoolWord", "CoolWard", "CoolWord", "CoolWord"]) AS string_with_most_occurrences
@calebwaldner
Copy link
Author

calebwaldner commented May 12, 2021

A BigQuery function that accepts an array of similar strings and returns the string that occurs most within array. Intended for use with SOUNDEX() function to facilitate users inputing the similar words but with some possible spelling errors; GetMainStringValue would accept that array of user values and output what seems the be the main intended value (assuming the more users spelled the words correctly than did not).

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment