Skip to content

Instantly share code, notes, and snippets.

View Stiivi's full-sized avatar

Stefan Urbanek Stiivi

View GitHub Profile
from bubbles import Pipeline
stores = {
"target": {"type": "csv", "path": "."}
}
p = Pipeline(stores=stores)
p.source_object("xls", resource="cpv_2008_ver_2013.xlsx")
p.transpose_by("CODE", "country", "label")
# Demo:
#
# Aggregate population per independence type for every year
# Sources: Population and Country Codes datasets
#
from bubbles import Pipeline
from bubbles import get_logger
logger = get_logger()
# Demo:
#
# Aggregate population per independence type for every year
# Sources: Population and Country Codes datasets
#
from bubbles import Pipeline
# List of stores with datasets. In this example we are using the "datapackage"
# store
@Stiivi
Stiivi / sqlalchemy_expression_compiler.py
Last active December 20, 2015 10:49
Lightweight Expressions: Example of simple SQLAlchemy expression compiler. For more information see https://github.com/Stiivi/expressions
class SQLAlchemyExpressionCompiler(object):
def __init__(self, statement):
# Context of this compiler is a SQLAlchemy statement object
self.statement = statement
def compile_literal(self, literal):
return literal
def compile_variable(self, variable):
# Get a column object from the statement
@Stiivi
Stiivi / aggregate_over_window.py
Created July 17, 2013 05:22
Bubbles recipe: Aggregate over window - assign an aggregated value over a window specified by a key (might be compound) to every row. Current example assigns latest purchase year of a customer to every order. Source in: https://github.com/Stiivi/bubbles/blob/master/examples
from bubbles import Pipeline, FieldList, data_object, open_store
# Sample order data with fields:
fields = FieldList(
["id", "integer"],
["customer_id", "integer"],
["year", "integer"],
["amount", "integer"]
)
@Stiivi
Stiivi / customers_who_ordered-sql.py
Last active March 17, 2017 15:41
Another simple Bubbles example: Use two data sources: customers and orders CSV files (exports from Volusion platform). Get list of customers (name, email address) who made orders in certain range of years. There are two versions of the same process: one involves loading the data into a SQL table and performing the operations using SQL, the other…
from bubbles import Pipeline, open_store
stores = {
"source": open_store("csv", "data/source", encoding="utf16", infer_fields=True),
"target": open_store("sql", "sqlite:///data.sqlite")
}
p = Pipeline(stores=stores)
# Load customers into a SQL table
@Stiivi
Stiivi / bubbles_pipeline_join.py
Last active January 19, 2018 12:00
Demonstration of new graph-based pipeline, graph execution (see debug logs) and joins on the pipeline. Also demonstrates dynamic dispatch when the pipeline is redirected to a SQL table. Works with bubbles commit 5e108ad3a3f46580ebfe16168c58308bc914cf30 from Jul 2 2013.
import bubbles
stores = { "target": bubbles.open_store("sql", "sqlite:///") }
p = bubbles.Pipeline(stores=stores)
p.source_object("csv_source", resource="data.csv", infer_fields=True)
# Uncomment this and see the difference in logs - SQL will be used
# p.create("target", "data")
@Stiivi
Stiivi / gist:5602392
Last active December 17, 2015 11:29
Brewery2 pipelines – revival or former "forking forks" [1], now using operation kernel and virtual data objects. In this example: just denormalizing two tables into one dimension with field selection. Note that target table does not contain detail keys used for join. References: [1] http://blog.databrewery.org/posts/forking-forks-with-higher-ord…
from brewery2 import Pipeline, open_store
stores = {
"source": open_store("sql", "postgres://localhost/crm", schema="app"),
"target": open_store("sql", connectable=source_store.connectable, schema="cubes")
}
p = Pipeline(stores=stores)
p.source("source", "crm_contact")
p.field_filter(keep=["id",
@Stiivi
Stiivi / dallas_data_brewery-answers.markdown
Last active June 23, 2018 04:15
Dallas Data Brewery meetup group answers

What tools do you use?

  • Propreitary Software, R, Python, SQL, Gephi
  • Tableau; Excel; Access
  • SPSS in the application of psych statistics and research methods
  • Tableau, SQL, SPSS, R and other statistical tools.
  • SSMS, R, SSAS
  • Python, Matplotlib, Disco...
  • Proprietary
  • R, SPSS, SAS, Relational DB
@Stiivi
Stiivi / brewery2-added_rows.py
Created April 30, 2013 06:03
Brewery2 Example: using kernel function added_keys and added_rows. The added_rows will be called with two flavours: one as (sql, sql) the other will be retried with (rows, sql).
from brewery2 import k, open_store, FieldList
DATA_TARGET = [
[1, "Janko", "Bratislava"],
[2, "Marienka", "Bratislava"],
[3, "Jaga", "Zvolen"]
]
DATA_SRC = [
[1, "Janko", "Bratislava"],