Sean Lopp slopp

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                slopp
                / README.md
            
            
              Last active
              March 23, 2024 01:51
            
              
                Dynamic pipeline that invokes k8 ops 
              
          
    Dynamic Pipeline

This example shows the psuedo-code for a Dagster pipeline that:

Accepts the path to a raw dataset as a string
Runs a step to break the raw dataset into partitions
For each partition, the pipeline runs a series of two processing steps. Each processing step is a call out to a Docker container to run supplying the partition key as an input argument.  The partitions are run together in parallel before being collected in a final processing step that operates on all the partitions.

To run the pipeline:

  
## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                slopp
                / README.md
            
            
              Last active
              March 20, 2024 06:36
            
              
                BigQuery and Google Maps 
              
          
    Introduction

With BigQuery's new remote user defined functions (in preview) it is now possible to bring the power of Google Maps to your analytic data warehouse. Using Google Maps API endpoints in Cloud Functions called by BigQuery you can:

Geocode Addresses
Determine drive time distance between locations
Supplementing address or location data with Google Map's data such as elevation or place descriptions

By enriching location datasets in BigQuery you can accomplish advanced spatial analysis including:

  
## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                slopp
                / README.md
            
            
              Created
              February 27, 2024 16:14
            
              
                SQL Server to GCS to BQ Dagster Pipeline Example
              
          
    This example shows a skeleton for how to build a Dagster project that extracts tables from SQL Server, stores the extract as a CSV in GCS, and then uploads the GCS extract to BigQuery.
The actual extract and load logic is omitted. But the purpose of this project is to show how such a pipeline can be represented in Dagster assets.
First, a single pipeline for one table is created. This is demonstrated in the file dagster_mock_one_table.py. To run this example:

Create a Python virtual environment and then run:

pip install dagster dagster-webserver

  
## penguins.csv

          
            rowid
            species
            island
            bill_length_mm
            bill_depth_mm
            flipper_length_mm
            body_mass_g
            sex
            year

            
              1
              Adelie
              Torgersen
              39.1
              18.7
              181
              3750
              male
              2007

            
              2
              Adelie
              Torgersen
              39.5
              17.4
              186
              3800
              female
              2007

            
              3
              Adelie
              Torgersen
              40.3
              18
              195
              3250
              female
              2007

            
              4
              Adelie
              Torgersen
              NA
              NA
              NA
              NA
              NA
              2007

            
              5
              Adelie
              Torgersen
              36.7
              19.3
              193
              3450
              female
              2007

            
              6
              Adelie
              Torgersen
              39.3
              20.6
              190
              3650
              male
              2007

            
              7
              Adelie
              Torgersen
              38.9
              17.8
              181
              3625
              female
              2007

            
              8
              Adelie
              Torgersen
              39.2
              19.6
              195
              4675
              male
              2007

            
              9
              Adelie
              Torgersen
              34.1
              18.1
              193
              3475
              NA
              2007

## fn_profile.yaml
profile:
  name: FN
stocks_to_index:
  - ticker: NFLX
  - ticker: META
index_strategy:
  type: equal
forecast:
  days: 60

## credits.sql

with events as (select distinct

    DATE_FORMAT(timestamp, '%Y-%m')  as event_month,
    dagster_event_type,
    coalesce(run_id, '||', step_key) as step_id,
    count(1) as credits

from event_logs
where dagster_event_type = 'STEP_START'

## observable_source_assets1.py
# run with dagster-dev -f observable_source_assets1.py, then enable the auto materialization daemon
# the downstreams will update as the live data is observed

from dagster import observable_source_asset, asset, AutoMaterializePolicy, DataVersion
from datetime import datetime

def today():
    return datetime.today().date()

def now():

## example_of_observable_assets.py
@observable_source_asset(
        auto_observe_interval_minutes=24*60
)
def daily_data():
    return DataVersion(str(today()))

@observable_source_asset(
        auto_observe_interval_minutes=10
)
def live_data():

## README.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                slopp
                / README.md
            
            
              Created
              July 12, 2023 16:33
            
              
                Ops and Jobs Example
              
          
    Ops and Jobs Example

This example shows how to use ops to create a graph of tasks executed as a Dagster job with specific configuration and resources. The example also shows how to use a custom schedule.
To start:
pip install dagster dagit
dagster dev -f ops_example.py

  
## 3d_multi_partitions.py
from dagster import DailyPartitionsDefinition
from dagster import asset, OpExecutionContext, Definitions, AssetKey

from itertools import permutations


# Root Nodes

date = DailyPartitionsDefinition(start_date="2023-06-01")
colors = ["blue", "red"]
rowid	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
1	Adelie	Torgersen	39.1	18.7	181	3750	male	2007
2	Adelie	Torgersen	39.5	17.4	186	3800	female	2007
3	Adelie	Torgersen	40.3	18	195	3250	female	2007
4	Adelie	Torgersen	NA	NA	NA	NA	NA	2007
5	Adelie	Torgersen	36.7	19.3	193	3450	female	2007
6	Adelie	Torgersen	39.3	20.6	190	3650	male	2007
7	Adelie	Torgersen	38.9	17.8	181	3625	female	2007
8	Adelie	Torgersen	39.2	19.6	195	4675	male	2007
9	Adelie	Torgersen	34.1	18.1	193	3475	NA	2007
	profile:
	name: FN
	stocks_to_index:
	- ticker: NFLX
	- ticker: META
	index_strategy:
	type: equal
	forecast:
	days: 60

	with events as (select distinct

	DATE_FORMAT(timestamp, '%Y-%m') as event_month,
	dagster_event_type,
	coalesce(run_id, '\|\|', step_key) as step_id,
	count(1) as credits

	from event_logs
	where dagster_event_type = 'STEP_START'
	# run with dagster-dev -f observable_source_assets1.py, then enable the auto materialization daemon
	# the downstreams will update as the live data is observed

	from dagster import observable_source_asset, asset, AutoMaterializePolicy, DataVersion
	from datetime import datetime

	def today():
	return datetime.today().date()

	def now():
	@observable_source_asset(
	auto_observe_interval_minutes=24*60
	)
	def daily_data():
	return DataVersion(str(today()))

	@observable_source_asset(
	auto_observe_interval_minutes=10
	)
	def live_data():
	from dagster import DailyPartitionsDefinition
	from dagster import asset, OpExecutionContext, Definitions, AssetKey

	from itertools import permutations


	# Root Nodes

	date = DailyPartitionsDefinition(start_date="2023-06-01")
	colors = ["blue", "red"]