Skip to content

Instantly share code, notes, and snippets.

@timrobertson100
Last active October 19, 2018 09:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save timrobertson100/2a5a23ed0494c35317e8 to your computer and use it in GitHub Desktop.
Save timrobertson100/2a5a23ed0494c35317e8 to your computer and use it in GitHub Desktop.
Example creating CSV file in Hue

Step 1: Create a table as CSV

CREATE EXTERNAL TABLE tim.delimiter_csv (
  gbifId INT,
  v_scientificName STRING  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE LOCATION '/user/tim/delimiter.csv'

Step 2: Populate the table

  1. add the jar (/user/trobertson/occurrence-hive-0.22-SNAPSHOT-jar-with-dependencies.jar)
  2. create the UDF (cleanDelimiters and org.gbif.occurrence.hive.udf.CleanDelimiterCharsUDF)
  3. run the SQL:
INSERT OVERWRITE TABLE tim.delimiter_csv
SELECT 
  gbifId, 
  cleanDelimiters(v_scientificName)
FROM prod_b.occurrence_hdfs
LIMIT 1000000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment