Skip to content

Instantly share code, notes, and snippets.

@ian-whitestone
Last active March 1, 2023 01:45
Show Gist options
  • Save ian-whitestone/211d0a27687ac3daa3f49699b4606613 to your computer and use it in GitHub Desktop.
Save ian-whitestone/211d0a27687ac3daa3f49699b4606613 to your computer and use it in GitHub Desktop.
Best practices for presto sql

Presto Specific

  • Don’t SELECT *, Specify explicit column names (columnar store)
  • Avoid large JOINs (filter each table first)
    • In PRESTO tables are joined in the order they are listed!!
    • Join small tables earlier in the plan and leave larger fact tables to the end
    • Avoid cross joins or 1 to many joins as these can degrade performance
  • Order by and group by take time
    • only use order by in subqueries if it is really necessary
  • When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
  • Use approx_distinct() instead of count(distinct) for very large datasets
  • Use approx_percentile(metric, 0.5) for median
  • Avoid UNIONs where possible
  • Use WITH statements vs. nested subqueries
  • Presto broadcasts the right side table in joins, declare larger tables first and filter right side tables to as small as possible
  • LIKE takes time, in particular when you add %s on both sides
    • Use REGEXP_LIKE() if multiple like statements
    • l_comment LIKE '%wake%' OR l_comment LIKE '%regular%' OR l_comment LIKE '%express%' --> regexp_like(l_comment, 'wake|regular|express')
  • SHOW COLUMNS FROM table_name;
  • SELECT * SYSTEM.JDBC.TABLES
    • Can also use COLUMNS, CATALOGS, SCHEMAS
    • Use WHERE COLUMN_NAME/TABLE_NAME Like ‘%client%’
  • SHOW TABLES FROM SYSTEM.JDBC

References

General SQL tips

  • start analysis on a small dataset
  • Use the query plan (EXPLAIN)
  • ALIAS tables in the case of multiple joins
@ripley
Copy link

ripley commented Feb 21, 2023

Looks like this statement:
- Join small tables earlier in the plan and leave larger fact tables to the end
is in contrary of the aws document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment