As a reminder, Looker has three core components: first, it provides a physical modeling layer (LookML), which one uses to abstract SQL generation and define complex transforms; second, Looker consumes the LookML model and generates dialect-specific SQL, which it then issues to the database via JDBC; third, it provides a web-based UI which acts as both a LookML IDE as well as the primary means of exploring data.
Let's consider a simple schema containing two tables, job
and warehouse
, which we'll use as an example. In Looker, our representation of these tables would look like this:
- view: job
sql_table_name: public.job
fields:
# dimensions
- dimension: id
description: This is the primary key of the job table.
type: number
primary_key: true
sql: ${TABLE}.id
- dimension_group: created
description: |
This dimension group parses out various time attributes from the job's timestamp.
type: time
sql: ${TABLE}.created_at
- dimension: type
description: This is the type of job—e.g., select, copy, create, etc.
type: string
sql: ${TABLE}.type
- dimension: bytes
description: This is the number of bytes processed by the job.
type: number
sql: ${TABLE}.bytes
- dimension: warehouse_id
description: This is a foreign key to the warehouse table.
type: number
sql: ${TABLE}.warehouse_id
# measures
- measure: count
type: count
- measure: total_bytes
type: sum
sql: ${bytes}
- view: warehouse
sql_table_name: public.warehouse
fields:
# dimensions
- dimension: id
description: This is the primary key of the warehouse table.
type: number
primary_key: true
sql: ${TABLE}.id
- dimension: size
description: This is the size of the warehouse—e.g., small, medium, large, etc.
type: string
sql: ${TABLE}.size
# measures
- measure: count
type: count
- explore: job
joins:
- join: warehouse
type: left_outer
relationship: many_to_one
sql_on: ${job.warehouse_id} = ${warehouse.id}
Having built our simple LookML model, all we have to do is explore it. If we choose a measure, Looker generates our aggregate, as we'd expect.
If we pull in fields from warehouse
, Looker makes the join and we're off to the races.
In this case, Looker dispatches the SQL to the database, and renders the response in tabular or visual format. Simple.
Looker recently broadened their offering of RESTful APIs. This is interesting. All of the features in the UI are now accessible programatically: building and saving reports; scheduling reports; downloading pdfs of dashboards and reports; etc. Considering that the APIs rely on the LookML model(s) one defines, it's as though we have APIs into our own modelled data.
The specifics of generating SDKs using Swagger Codegen is outlined here. I won't step through the details of that process. Instead, let me step through how one might interface with these APIs using a Python SDK.
import looker_python as looker
import pandas as pd
import numpy as np
# preliminaries
base_url = 'your_looker_api_endpoint'
client_id = 'your_api_id'
client_secret = 'your_api_secret'
# instantiate Auth API
unauthenticated_client = looker.ApiClient(base_url)
unauthenticated_authApi = looker.ApiAuthApi(unauthenticated_client)
# authenticate client
token = unauthenticated_authApi.login(client_id = client_id, client_secret = client_secret)
client = looker.ApiClient(base_url, 'Authorization', 'token ' + token.access_token)
# instantiate Query API
query = looker.QueryApi(client)
# build up query with dictionary literal
query_elements = {
"model":"test_model",
"view":"job",
"fields":["job.total_bytes", "warehouse.size"],
"filters":[{"job.type":"select"}, {"job.created_date":"past 3 days"}],
}
'''
As a reminder, Looker should generate the following SQL:
select warehouse.size as "warehouse.size"
, sum(job.bytes) as "job.total_bytes"
from job
left join warehouse
on job.warehouse_id = warehouse.id
where job.created_at::date between current_date and dateadd(day, -3, created_at::date)
and job.type = 'select'
'''
# run query
response = query.run_inline_query('csv', query_elements)
# print response to screen
print(response)
# 'Warehouse Size,Job Total Bytes\nXLARGE,22987304107894660\nMEDIUM,5722835289218486\nLARGE,4579839923756189\nXXLARGE,3914624544979625\nSMALL,3862319027214091\nXSMALL,1178807351459083\nX4LARGE,981361345867807\nXXXLARGE,736888834334599\n'
# helper functions to convert API response to pandas dataframe
def header_cleanup(header):
return(header.replace(" ", "_").lower().split('\t'))
def response_to_dataframe(body):
a = []
for i, v in enumerate((body.split('\n')[:-1])):
if i == 0:
a.append(header_cleanup(v))
else:
a.append(v.split('\t'))
header = a.pop(0)
return(pd.DataFrame(a, columns = header))
# convert response to pandas dataframe
df = response_to_dataframe(response)
# print dataframe to screen
print(df)
'''
warehouse_size, total_bytes
XLARGE,22987304107894660
MEDIUM,5722835289218486
...
XXXLARGE,736888834334599
'''