Skip to content

Instantly share code, notes, and snippets.

@andrewdoss-bit
Created August 12, 2021 20:42
Show Gist options
  • Save andrewdoss-bit/e7449c8bfeba5996053e040a18c74c67 to your computer and use it in GitHub Desktop.
Save andrewdoss-bit/e7449c8bfeba5996053e040a18c74c67 to your computer and use it in GitHub Desktop.
Main
"""This is an example of a simple ETL pipeline for loading data into bit.io.
This example omits many best practices (e.g. logging, error handling,
parameterizatin + config files, etc.) for the sake of a brief, minimal example.
"""
import os
import sys
from dotenv import load_dotenv
import extract
import transform
import load
# Load credentials from ENV
load_dotenv()
PG_CONN_STRING = os.getenv('PG_CONN_STRING')
def main(src, dest, local_src, options):
"""
Executes ETL pipeline for a single table.
Extracts source data, (optionally) transforms the data, and loads the data
to a Postgres database on bit.io.
Parameters
----------
src : str
URL for source data extraction.
dest : str
Fully-qualified table for load into bit.io.
local_src : str
True if src is path to a local csv file.
options : dict
Option - argument map from the user command.
"""
# EXTRACT data
if local_src:
df = extract.csv_from_local(src)
else:
df = extract.csv_from_get_request(src)
# TRANSFORM data
if 'name' in options:
if hasattr(transform, options['name']):
df = getattr(transform, options['name'])(df)
else:
raise ValueError("Specified transformation name not found.")
# LOAD data
load.to_table(df, dest, PG_CONN_STRING)
# Truncated for Medium, see github.com/bitdotioinc/simple-pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment