Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kenjyco/1f9c06bc603a1944803855bb50e89204 to your computer and use it in GitHub Desktop.
Save kenjyco/1f9c06bc603a1944803855bb50e89204 to your computer and use it in GitHub Desktop.

Trino with SQLAlchemy and Great Expectations

1. Sign up for a free Starburst Galaxy account (if you don't already have one)

2. Get your Starburst Galaxy connection string components for SQLAlchemy

  • Visit https://_____.galaxy.starburst.io/cluster
    • Click the vertical 3 dots next to your desired cluster (i.e. "sample")
    • Click "Get connection"
      • (As of July 2022, there is no connection sample for SQLAlchemy, so we must construct it)
      • In the right side panel, copy the User, Host, and Port fields
    • Click "Catalogs" from the left navigation area
      • Copy the name of the catalog you wish to connect to (i.e. "sample")
  • Clean the connection string components
    • The User you copied is likely your email address with a little extra (i.e. you@email.com/accountadmin)
      • Remove the extra stuff
      • Convert the @ in your email address to %40 (i.e. you%40email.com)
    • If your password contains any of @, :, /, ?, #, you will need to percent-encode those characters as well
  • Assemble your connection string
    • trino://{username}:{password}@{hostname}:{port}/{catalog}
    • (i.e. trino://you%40email.com:s0meSw33tPassw0rd@yourdomain-sample.galaxy.starburst.io:443/sample)

3. Create a Python virtual environment and install Great Expectations, SQLAlchemy, and Trino

% mkdir ~/my-gx-trino-playground

% cd ~/my-gx-trino-playground

% python3 -m venv venv

% venv/bin/pip install "great_expectations[trino]"

% source venv/bin/activate

4. Initialize Great Expectations

% great_expectations init
  • Type y and hit <enter/return> at the prompt

5. Add your Starburst Galaxy password as a config variable

% vim great_expectations/uncommitted/config_variables.yml

...

starburst_pass: s0meSw33tPassw0rd

6. Add your Starburst Galaxy cluster as a new datasource

% great_expectations datasource new
  • Select 2 for "Relational database" and hit <enter/return>
  • Select 6 for "other" and hit <enter/return>
  • Edit the jupyter notebook that was just opened
    • In cell 2, change the datasource_name from "my_datasource" to something else (i.e. "starburst_sample")
    • In cell 3, change the connection_string from "YOUR_CONNECTION_STRING" to the connection string you constructed above
      • Remember to substitute your password with ${starburst_pass} (i.e. "trino://you%40email.com:${starburst_pass}@yourdomain-sample.galaxy.starburst.io:443/sample")
  • Run all notebook cells
    • In cell 5 when running context.test_yaml_config(...), you should see the two "data asset names" are detected (i.e. "demo.astronauts" and "demo.missions")
    • After the last cell has run, you can check that the great_expectations/great_expectations.yml file was updated

7. Create an Expectation Suite

% great_expectations suite new
  • Select 3 create automatically using a profiler and hit <enter/return>
  • Select 1 to use the "demo.astronauts" data asset and hit <enter/return>
  • Enter a name for the new Expectation Suite (i.e. astro-suite) and hit <enter/return>)
  • Select y to proceed and hit <enter/return>
  • Edit the jupyter notebook that was just opened
    • In cell 1, you can edit the batch_request, but this is not required
    • In cell 2, comment out some column names that you don't want excluded (i.e. nationality, total_number_of_missions, total_hrs_sum)
  • Run all notebook cells

8. Edit the suite as needed

% great_expectations suite edit astro-suite
  • Select 2 to interactively edit with a sample batch of data and hit <enter/return>
  • Adjust any Expectations
  • Run all notebook cells for the jupyter notebook that was just opened

9. Create a new checkpoint

% great_expectations checkpoint new my-checkpoint

10. Run your checkpoint

% great_expectations checkpoint run my-checkpoint

11. Build your Data Docs

% great_expectations docs build
  • Select y to proceed and hit <enter/return>

Resources

Gallery

Vids

Docs

Repos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment