Skip to content

Instantly share code, notes, and snippets.

@razhangwei
Last active February 17, 2023 06:51
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save razhangwei/7b90d08ca90fa93682b492e806fb97c7 to your computer and use it in GitHub Desktop.
Save razhangwei/7b90d08ca90fa93682b492e806fb97c7 to your computer and use it in GitHub Desktop.
Presto cheatsheet #Presto #SQL

Table/partition management

  • drop table: drop table table_name
  • rename: ALTER TABLE old_table RENAME TO new_table
  • delete certain partitions: delete from shop_mall_post_attributes where ds <= '2020-06-14';
  • insert rows: INSERT INTO table_name QUERY

Important Topics

  • UDFs
  • array
  • flatten complex type
  • Json
  • Map
  • Lambda
  • Row
  • Enum

Useful UDFs

  • JSON related: json_parse, json_extract_scalar, json_extract, json_format
  • arrauy operations: ARRAY_UNION, ARRAY_INTERSECT
  • transform, reduce
  • try/try_cast:
    • Evaluate an expression and handle certain types of errors by returning NULL.
    • Example: TRY(JSON_PARSE(sparse_features)) IS NOT NULL (JSON_PARSE is likely to fail due to data corruption. )

Flatten Complex Type: UNNEST

FROM table
CROSS JOIN UNNEST(array_col) AS t1 (col_val)
CROSS JOIN UNNEST(map_col) AS t2 (map_key, map_val)

ROW: default field name: field0, field1, ...

Type Conversion

  • Convert json to array: CAST(JSON_PARSE(embedding_str) as ARRAY<REAL>) as embedding
  • int tiemstamp to date CAST(DATE(PARSE_DATETIME(create_time, 'YYYY-MM-dd HH:mm:ss')) AS VARCHAR)

Manually define a small table

SELECT
    *
FROM a 
CROSS JOIN (
    VALUES
        (0, 'imp'),
        (1, 'clk')
) AS t (event, event_name)

SQL fundamental

  • put filtering conditions of the join table to ON.
    • it means only join when conditions are met
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
  • what would happen for LEFT JOIN if conditions are applied on WHERE
    • conditions on left table: they will still be applied on every row in left table, so it is equivalent to filtering first.
    • conditions on right table: as the table will joined first, then essentally all failed joined rows, i.e., b.col is null will be not met the conditions, so that only the joined row will be left; LEFT JOIN essentially becomes INNERT JOIN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment