Skip to content

Instantly share code, notes, and snippets.

@myui
Created June 19, 2019 08:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save myui/273c08c756de08d82d6cfa3a575d8221 to your computer and use it in GitHub Desktop.
Save myui/273c08c756de08d82d6cfa3a575d8221 to your computer and use it in GitHub Desktop.
with tmp as (
select
-- group by is sometimes faster than distinct
-- distinct extract_feature(feature) as feature
extract_feature(feature) as feature
from
test l
lateral view explode(features) r as feature
),
mapped as (
select
feature,
feature_hashing(feature) as index
from
tmp
group by
feature
)
-- INSERT OVERWRITE TABLE mapping
select
index,
collect_set(feature) as features -- collision can be happened
from
mapped
group by
index
-- order by index asc
-- limit 100
@myui
Copy link
Author

myui commented Jun 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment