Skip to content

Instantly share code, notes, and snippets.

View stichbury's full-sized avatar

Jo Stichbury stichbury

View GitHub Profile
def create_master_table(
shuttles: pd.DataFrame, companies: pd.DataFrame, reviews: pd.DataFrame
) -> pd.DataFrame:
"""Combines all data to create a master table.
Args:
shuttles: Preprocessed data for shuttles.
companies: Preprocessed data for companies.
reviews: Raw data for reviews.
Returns:
def create_pipeline(**kwargs):
return Pipeline(
[
node(
func=preprocess_companies,
inputs="companies",
outputs="preprocessed_companies",
name="preprocess_companies_node",
),
node(
def preprocess_companies(companies: pd.DataFrame) -> pd.DataFrame:
"""Preprocesses the data for companies.
Args:
companies: Raw data.
Returns:
Preprocessed data, with `company_rating` converted to a float and
`iata_approved` converted to boolean.
"""
companies["iata_approved"] = _is_true(companies["iata_approved"])
from kedro.io import DataCatalog, MemoryDataSet
from kedro.pipeline import node, Pipeline
from kedro.runner import SequentialRunner
# Prepare a data catalog
data_catalog = DataCatalog({"my_salutation": MemoryDataSet()})
# Prepare first node
def return_greeting():
return "Hello"
@stichbury
stichbury / Hooks_example_table.md
Last active May 13, 2020 10:10
Hooks example table
Additional behavior Hook implementations Additional tools used Example
Pipeline performance monitoring before_node_run after_node_run after_pipeline_run StatsD, Grafana PipelineMonitoringHooks
Data validation for node inputs and outputs before_node_run after_node_run Great Expectations DataValidationHooks
Experiment tracking after_node_run before_pipeline_run after_pipeline_run MLflow [ModelTrackingHooks](https://github.com/quantumblacklabs/kedro-examples/blob/master/kedro-hooks-tutorial/src/kedro_hooks_tutorial/hoo
+-----+--------+-------------+-----------------+------------+
| ID | Name | CountryCode | District | Population |
+-----+--------+-------------+-----------------+------------+
| 130 | Sydney | AUS | New South Wales | 3276207 |
+-----+--------+-------------+-----------------+------------+
1 row in set (0.00 sec)
./bin/graql.sh
>>> match $x isa country; limit 10;
$x id "country-AGO" isa country;
$x id "country-ARE" isa country;
$x id "country-ANT" isa country;
$x id "country-ARG" isa country;
$x id "country-DZA" isa country;
$x id "country-ATG" isa country;
$x id "country-ASM" isa country;
$x id "country-NLD" isa country;
CountryCode-parent isa role-type;
CountryCode-child isa role-type;
CountryCode-relation isa relation-type,
has-role CountryCode-child,
has-role CountryCode-parent;
city plays-role CountryCode-parent;
country plays-role CountryCode-child;
insert
city isa entity-type,
has-resource ID,
has-resource Name,
has-resource CountryCode,
has-resource District,
has-resource Population;
country isa entity-type,
./migration.sh sql -driver <jdbcDriver> -user <username> -pass <password> -database <url> -graph <graphname> [engine <url>]
-driver JDBC driver
-user username for SQL database
-pass password for SQL database
-database URL to SQL database
-graph graph name
-engine MindmapsDB engine URL, default localhost