Skip to content

Instantly share code, notes, and snippets.

@etheleon
Last active August 11, 2021 01:48
Show Gist options
  • Save etheleon/e1f78c22d31f45623daca22a0305df05 to your computer and use it in GitHub Desktop.
Save etheleon/e1f78c22d31f45623daca22a0305df05 to your computer and use it in GitHub Desktop.
translating PRESTO to SPARKSQL

Common tasks

Presto Spark
array_join(array[year, month, day], '-', 'NA') CONCAT_WS('-', col1, col2, col3)
DATE_ADD('day', -7, date '2021-07-01') date_sub('2021-07-01', 7)
array_join('one', 'two', 'three')[1] # one indexed array_join('one', 'two', 'three')[0] # zero indexed

Partition Naming convention

You might be tempted to save your date partitions as as integers year=2021/month=7/day=1, but i personally believe it causes issues when writing queries when it could be as easy as year||month||day = '20210701 it'll have to be manually crafted

In the case when you have no control over the partition, and it's been formated as integer without 0 padding ie. %c and %e

  • PRESTO

    SELECT * FROM {SCHEMA}.{TBL}
    WHERE DATE(FORMAT('%d-%d-%d', year, month, day)) = DATE '2021-07-01' 
    LIMIT 5
  • spark

    SELECT * FROM {SCHEMA}.{TBL}
    WHERE TO_DATE(FORMAT_STRING("%d-%d-%d", year, month, day)) = '2021-07-01'
    LIMIT 5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment