Skip to content

Instantly share code, notes, and snippets.

View spbail's full-sized avatar
😎

Sam Bail spbail

😎
View GitHub Profile
@spbail
spbail / Mapping column names in an expectation suite
Created January 21, 2021 22:15
Mapping column names in an expectation suite
# This is just hacked together and there's probably some ways to make it nicer, but it works for me.
# The mapping method
def map_column_names(expectation_suite, mapping_dict):
for exp in expectation_suite.expectations:
if 'column' in exp.get_domain_kwargs():
source_col = exp.get_domain_kwargs()['column']
if source_col in mapping_dict:
target_col = mapping_dict[source_col]
@spbail
spbail / GE airflow operator examples
Last active November 3, 2020 22:31
Examples for invoking the GE Airflow operator
import airflow
from airflow import AirflowException
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.providers.greatexpectations.operators.greatexpectations import GreatExpectationsOperator
default_args = {
"owner": "Airflow",
"start_date": airflow.utils.dates.days_ago(1)
}
@spbail
spbail / airflow_operator_mockup.py
Last active October 22, 2020 14:54
Mockup of a Great Expectations Airflow operator and stubs for calling the operator in a DAG file
# This is a super hacky non-working mockup of a GE operator that runs validation
# It's making pretty heavy use of overloading the init in order to provide a single
# interface for a user for any of the following permutations:
# A. data context instantiation:
# 1. use the default data context (current working directory)
# 2. provide a path to a data context
# 3. pass in a data context object that's been created from a dictionary (or some other way)
# B. what to validate:
# 1. pass in a suite name + batch kwargs to validate a single batch with that expectation suite
# 2. pass in a list of suites+batch kwargs (since this is what ge.validation_operator([list_of_batches]) accepts
@spbail
spbail / ge_excel.py
Created October 13, 2020 16:06
Code snippet that demonstrates how to use Great Expectations to validate Excel files
import datetime
import pandas as pd
import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.data_context.types.resource_identifiers import (
ValidationResultIdentifier,
)
# You'll have to run `great_expectations init` to create a context that can be loaded here
# I also had to set up a "dummy" datasource called data__dir pointing at a directory
@spbail
spbail / gist:7b2002dc02e4a94ddd6179745ab9290a
Created August 11, 2020 18:07
minimal example for validation result URL in data docs
import great_expectations as ge
context = ge.data_context.DataContext()
expectation_suite_name = 'taxi.demo'
datasource_name = 'my_postgres_db'
batch_kwargs = {'table': "yellow_tripdata_staging", 'datasource': datasource_name}
batch = context.get_batch(batch_kwargs, expectation_suite_name)
results = context.run_validation_operator(
project_config = DataContextConfig(
config_version=2,
plugins_directory=None,
config_variables_file_path=None,
datasources={
'test_datasource': {
'class_name': 'PandasDatasource',
'data_asset_type': {
'class_name': 'PandasDataset'
}
@spbail
spbail / description
Created August 22, 2013 19:46
Some fun OWL / OWL API problems.
For the ORE 2013 reasoner competition, the Bio KB 101 team at SRI submitted 432 versions of Bio KB which were different OWL approximations to a FOL KB. We ran the OWL API's profile checker and found that only around 70 or so were OWL 2 DL (and thus used in the competition), the rest fell into OWL Full.
Unfortunately I didn't have time to look into the reasons for the ontologies being in Full before the competition, but I've managed to catch up now that we're planning some reruns. It turns out the "fullness" was caused by the following axiom:
SubClassOf(:Receptor-Tyrosine-Kinase-Dimer
ObjectIntersectionOf(:Protein-Complex :Organic-Molecule
ObjectIntersectionOf(
ObjectSomeValuesFrom(:element
ObjectIntersectionOf(:Monomer :Receptor-Tyrosine-Kinase))
ObjectSomeValuesFrom(:element
@spbail
spbail / meteor-async.md
Last active December 18, 2015 13:29 — forked from joscha/meteor-async.md

From Meteor's documentation:

In Meteor, your server code runs in a single thread per request, not in the asynchronous callback style typical of Node. We find the linear execution model a better fit for the typical server code in a Meteor application.

This guide serves as a mini-tour of tools, trix and patterns that can be used to run async code in Meteor.

Basic async

Sometimes we need to run async code in Meteor.methods. For this we create a Future to block until the async code has finished.

@spbail
spbail / add a Tomcat user
Last active December 18, 2015 00:29
Install Tomcat (6) on Mac OS X (used this one on a Mac Mini running Lion but I presume it should work on others as well). Last line starts up the server.
emacs /Library/Tomcat/conf/tomcat-users.xml
then add to the file:
<role rolename="manager-gui"/>
<user username="myusername" password="mypassword" roles="manager-gui"/>
@spbail
spbail / test.html
Last active December 17, 2015 06:19
Just a small test
<html>
<head>
<title>Just a small test</title>
</head>
<body>
<h1>Hello world.</h1>
<p>This will show up in <a href="http://bl.ocks.org/spbail/5564513">http://bl.ocks.org/spbail/5564513</a></p>
</body>
</html>