etheleon/presto2spark.md

## presto2spark.md

      
    Raw
  

              presto2spark.md
            
          
    Common tasks


Presto
Spark


array_join(array[year, month, day], '-', 'NA')
CONCAT_WS('-', col1, col2, col3)


DATE_ADD('day', -7, date '2021-07-01')
date_sub('2021-07-01', 7)


array_join('one', 'two', 'three')[1] # one indexed
array_join('one', 'two', 'three')[0] # zero indexed


Partition Naming convention

You might be tempted to save your date partitions as as integers year=2021/month=7/day=1, but i personally believe it causes issues when writing queries when it could be as easy as year||month||day = '20210701 it'll have to be manually crafted
In the case when you have no control over the partition, and it's been formated as integer without 0 padding ie. %c and %e


PRESTO
SELECT * FROM {SCHEMA}.{TBL}
WHERE DATE(FORMAT('%d-%d-%d', year, month, day)) = DATE '2021-07-01' 
LIMIT 5


spark
SELECT * FROM {SCHEMA}.{TBL}
WHERE TO_DATE(FORMAT_STRING("%d-%d-%d", year, month, day)) = '2021-07-01'
LIMIT 5
Presto	Spark
`array_join(array[year, month, day], '-', 'NA')`	`CONCAT_WS('-', col1, col2, col3)`
`DATE_ADD('day', -7, date '2021-07-01')`	`date_sub('2021-07-01', 7)`
`array_join('one', 'two', 'three')[1]` # one indexed	`array_join('one', 'two', 'three')[0]` # zero indexed