Skip to content

Instantly share code, notes, and snippets.

@glamp
Last active December 16, 2015 02:09
Show Gist options
  • Save glamp/5359991 to your computer and use it in GitHub Desktop.
Save glamp/5359991 to your computer and use it in GitHub Desktop.
--use SQL for parsing text
select
speaker
, regexp_split_to_table(lower(line), '\s+') as word
from
script_ferris_bueller
where
speaker = 'FERRIS'
LIMIT 10;
--word counts by character
select
speaker
, regexp_split_to_table(lower(line), '\s+') as word
, count(1) as wc
from
script_ferris_bueller
group by
speaker
, word
order by
3 desc
LIMIT 10;
-- speaker | word | wc
-----------+------+-----
-- FERRIS | a | 136
-- FERRIS | i | 120
-- FERRIS | to | 119
-- FERRIS | the | 112
-- FERRIS | you | 104
-- FERRIS | and | 101
-- FERRIS | of | 68
-- FERRIS | i'm | 52
-- FERRIS | in | 49
-- CAMERON | i | 49
--(10 rows)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment