Skip to content

Instantly share code, notes, and snippets.

@myui
Last active August 14, 2019 06:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save myui/7e54590c7d048002a4fd5a67e7c148ad to your computer and use it in GitHub Desktop.
Save myui/7e54590c7d048002a4fd5a67e7c148ad to your computer and use it in GitHub Desktop.
SELECT
-- conversion for libsvm format
label || ' ' || array_join(array_sort(
feature_hashing(features),
(x, y) -> if(cast(substr(x, 1, strpos(x, ':') - 1) as bigint) < cast(substr(y, 1, strpos(y, ':') - 1) as bigint),
-1,
if(substr(x, 1, strpos(x, ':') - 1) = substr(y, 1, strpos(y, ':') - 1), 0, 1)
)
), ' ') as line
from
input
SELECT
-- conversion for libsvm format
array_sort(features,
(x, y) -> if(cast(substr(x, 1, strpos(x, ':') - 1) as double) < cast(substr(y, 1, strpos(y, ':') - 1) as double),
-1,
if(substr(x, 1, strpos(x, ':') - 1) = substr(y, 1, strpos(y, ':') - 1), 0, 1)
)
) as features,
label
from
rf_input
@myui
Copy link
Author

myui commented Jun 13, 2019

sed -e 's/\"//g' -e 's/\[//' -e 's/\]//' 487600901.tsv | awk '{print $2,$1}' | sed -e 's/,/ /g' > 487600901.libsvm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment