Skip to content

Instantly share code, notes, and snippets.

@githoov
Last active July 17, 2021 18:20
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save githoov/1cbf40edd5cd42fe1b0d25eaa7300075 to your computer and use it in GitHub Desktop.
Save githoov/1cbf40edd5cd42fe1b0d25eaa7300075 to your computer and use it in GitHub Desktop.
A Quick Walkthrough of Looker and Accessing its APIs

Looker Overview

As a reminder, Looker has three core components: first, it provides a physical modeling layer (LookML), which one uses to abstract SQL generation and define complex transforms; second, Looker consumes the LookML model and generates dialect-specific SQL, which it then issues to the database via JDBC; third, it provides a web-based UI which acts as both a LookML IDE as well as the primary means of exploring data.

The LookML Model

Let's consider a simple schema containing two tables, job and warehouse, which we'll use as an example. In Looker, our representation of these tables would look like this:

job definition

- view: job
  sql_table_name: public.job
  fields:

# dimensions

      - dimension: id
        description: This is the primary key of the job table.
        type: number
        primary_key: true
        sql: ${TABLE}.id

      - dimension_group: created
        description: |
            This dimension group parses out various time attributes from the job's timestamp.
        type: time
        sql: ${TABLE}.created_at

      - dimension: type
        description: This is the type of job—e.g., select, copy, create, etc.
        type: string
        sql: ${TABLE}.type

      - dimension: bytes
        description: This is the number of bytes processed by the job.
        type: number
        sql: ${TABLE}.bytes

      - dimension: warehouse_id
        description: This is a foreign key to the warehouse table.
        type: number
        sql: ${TABLE}.warehouse_id

# measures

      - measure: count
        type: count

      - measure: total_bytes
        type: sum
        sql: ${bytes}

warehouse definition

- view: warehouse
  sql_table_name: public.warehouse
  fields:

# dimensions

      - dimension: id
        description: This is the primary key of the warehouse table.
        type: number
        primary_key: true
        sql: ${TABLE}.id

      - dimension: size
        description: This is the size of the warehouse—e.g., small, medium, large, etc.
        type: string
        sql: ${TABLE}.size

# measures

      - measure: count
        type: count

Expose job for exploration and join in warehouse

- explore: job
  joins:
    - join: warehouse
      type: left_outer
      relationship: many_to_one
      sql_on: ${job.warehouse_id} = ${warehouse.id}

The Standard Workflow

Having built our simple LookML model, all we have to do is explore it. If we choose a measure, Looker generates our aggregate, as we'd expect.

image

If we pull in fields from warehouse, Looker makes the join and we're off to the races.

image

In this case, Looker dispatches the SQL to the database, and renders the response in tabular or visual format. Simple.

The Looker APIs

Looker recently broadened their offering of RESTful APIs. This is interesting. All of the features in the UI are now accessible programatically: building and saving reports; scheduling reports; downloading pdfs of dashboards and reports; etc. Considering that the APIs rely on the LookML model(s) one defines, it's as though we have APIs into our own modelled data.

The specifics of generating SDKs using Swagger Codegen is outlined here. I won't step through the details of that process. Instead, let me step through how one might interface with these APIs using a Python SDK.

import looker_python as looker
import pandas as pd
import numpy as np

# preliminaries
base_url = 'your_looker_api_endpoint'
client_id = 'your_api_id'
client_secret = 'your_api_secret'

# instantiate Auth API
unauthenticated_client = looker.ApiClient(base_url)
unauthenticated_authApi = looker.ApiAuthApi(unauthenticated_client)

# authenticate client
token = unauthenticated_authApi.login(client_id = client_id, client_secret = client_secret)
client = looker.ApiClient(base_url, 'Authorization', 'token ' + token.access_token)

# instantiate Query API
query = looker.QueryApi(client)

# build up query with dictionary literal
query_elements = {
 "model":"test_model",
 "view":"job",
 "fields":["job.total_bytes", "warehouse.size"],
 "filters":[{"job.type":"select"}, {"job.created_date":"past 3 days"}],
}
'''
    As a reminder, Looker should generate the following SQL:
    
    select warehouse.size as "warehouse.size"
       , sum(job.bytes) as "job.total_bytes"
    from job
    left join warehouse
    on job.warehouse_id = warehouse.id
    where job.created_at::date between current_date and dateadd(day, -3, created_at::date)
    and job.type = 'select'
'''

# run query
response = query.run_inline_query('csv', query_elements)

# print response to screen
print(response)
# 'Warehouse Size,Job Total Bytes\nXLARGE,22987304107894660\nMEDIUM,5722835289218486\nLARGE,4579839923756189\nXXLARGE,3914624544979625\nSMALL,3862319027214091\nXSMALL,1178807351459083\nX4LARGE,981361345867807\nXXXLARGE,736888834334599\n'

# helper functions to convert API response to pandas dataframe
def header_cleanup(header):
  return(header.replace(" ", "_").lower().split('\t'))

def response_to_dataframe(body):
  a = []
  for i, v in enumerate((body.split('\n')[:-1])):
    if i == 0:
      a.append(header_cleanup(v))
    else:
      a.append(v.split('\t'))
  header = a.pop(0)
  return(pd.DataFrame(a, columns = header))
 
 # convert response to pandas dataframe
 df = response_to_dataframe(response)
 
 # print dataframe to screen
 print(df)
 '''
    warehouse_size, total_bytes
    XLARGE,22987304107894660
    MEDIUM,5722835289218486
    ...
    XXXLARGE,736888834334599
'''
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment