Skip to content

Instantly share code, notes, and snippets.

@gabegm
Last active June 9, 2021 15:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gabegm/458288fa55437314b7b41b63a53a13a1 to your computer and use it in GitHub Desktop.
Save gabegm/458288fa55437314b7b41b63a53a13a1 to your computer and use it in GitHub Desktop.
Querying AWS Athena using Julia ODBC

Athena.jl

Add the following to ./configs/default.yml.

Don't forget to add your own credentials and database settings!

default.yml

database:
  name: "SimbaAthenaODBCConnector"
  path: "/Library/simba/athenaodbc/lib/libathenaodbc_sb64.dylib"
  driver: "SimbaAthenaODBCConnector"
  region: "eu-west-1"
  s3_location: ""
  authentication_type: "IAM Credentials"
  uid: ""
  pwd: ""

Make sure you install the ODBC driver for your OS which you can find here.

Find out more about connection strings for Athena here.

example.jl

import Pkg
Pkg.activate(".")

Pkg.add(["ODBC", "DataFrames", "Configs"])

using ODBC, DataFrames, Configs

database = getconfig("database")
name = database.name
path = database.path
driver = database.driver
region = database.region
s3_location = database.s3_location
authentication_type = database.authentication_type
uid = database.uid
pwd = database.pwd
sql = open("src/sql/query.sql" ) do file
    read(file, String)
end

# locate existing ODBC driver shared libraries or download new, then configure
ODBC.adddriver(name, path)

# build connection string
connection_string = """
Driver=$driver;
AwsRegion=$region;
S3OutputLocation=$s3_location;
AuthenticationType=$authentication_type;
UID=$uid;
PWD=$pwd;
"""

conn = DBInterface.connect(ODBC.Connection, connection_string)

# execute sql statement directly, then materialize results in a DataFrame
df = DBInterface.execute(
    conn,
    sql
) |> DataFrame

297×3 DataFrame
 Row │ dt                       table_name  n_rows
     │ DateTime?               String?     Int64?
─────┼────────────────────────────────────────────────
   12021-05-08T06:46:24.183  Table_A     196040
   22021-05-08T06:46:24.183  Table_B     28172242
   32021-05-08T06:46:24.183  Table_C     27111764
   42021-05-06T06:46:29.916  Table_A     196041
   52021-05-06T06:46:29.916  Table_C     27080936
   62021-05-23T06:46:26.201  Table_A     196034
                               
 2932021-03-03T14:47:56.910  Table_B     27421193
 2942021-03-03T14:47:56.910  Table_C     26379105
 2952021-04-27T06:46:34.887  Table_A     196046
 2962021-04-27T06:46:34.887  Table_B     28016354
 2972021-04-27T06:46:34.887  Table_C     26960853
                                286 rows omitted
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment