Skip to content

Instantly share code, notes, and snippets.

@avivl
Created December 25, 2017 16:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save avivl/0e6ddbfdf1d7a4ee3d16c60bb0d18708 to your computer and use it in GitHub Desktop.
Save avivl/0e6ddbfdf1d7a4ee3d16c60bb0d18708 to your computer and use it in GitHub Desktop.
SELECT COUNT(*) as num_bigram,bigram FROM
(
SELECT
split
(
REGEXP_REPLACE(review, '([^\\s]+\\s[^\\s]*)\\s', '\\1|') +
'|'+
REGEXP_REPLACE(review, '([^\\s]+)\\s([^\\s]+\\s?)', '\\1|\\2'),
) as bigram
FROM [telegrass_reviews.reviews]
) t
WHERE t.bigram contains ' '
group by bigram order by num_bigram desc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment