Skip to content

Instantly share code, notes, and snippets.

View jaketf's full-sized avatar
🚲

Jake Ferriero jaketf

🚲
View GitHub Profile
@jaketf
jaketf / reschedule_hell_dag.py
Last active January 13, 2023 01:15
DAG to test reschedule mode sensors with lots of reschedules
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.sensors.external_task_sensor import ExternalTaskSensor
# Default Arguments for the DAG
default_args = {
"owner": "me",
"start_date": datetime(2022, 1, 1),
@jaketf
jaketf / dry_run_queries.py
Last active March 5, 2021 23:02
dry_run_queries
#!/usr/bin/env python3
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
@jaketf
jaketf / split_dml.sql
Last active February 18, 2021 18:41
BQ DML to split DML statement to modify < 4k partitions
/*
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
@jaketf
jaketf / pre-commit.log
Created November 18, 2020 02:04
bigquery-utils initial pre-commit log
No-tabs checker..........................................................Failed
- hook id: forbid-tabs
- exit code: 1
Tabs detected in file: tools/vscode_sql_extraction/.vscode/tasks.json
Tabs detected in file: tools/template_based_query_generation/src/test/java/graph/MarkovChainTest.java
Tabs detected in file: udfs/community/README.md
Tabs detected in file: tools/vscode_query_breakdown/src/test/suite/index.ts
Tabs detected in file: tools/unsupervised_dataset/sql_classifier/classifier/src/test/resources/queries_large.csv
Tabs detected in file: udfs/tests/run.sh
@jaketf
jaketf / README.md
Last active November 18, 2023 17:00
TPT to GCS script
@jaketf
jaketf / spark_example.py
Last active September 29, 2020 04:47
Spark DAG for validating Spark Connection
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
@jaketf
jaketf / bq_job_lables_monkey_patch.py
Last active July 31, 2020 19:34
(untested) monkey patch BigQuerySource to add bigquery job labels kwarg
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
@jaketf
jaketf / test_pandas_over_dicts.py
Created July 22, 2020 00:42
Using Pandas in an Apache Beam PTransform
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@jaketf
jaketf / run_relevant_pre_cloudbuilds.sh
Last active August 1, 2023 17:00
Run nested cloud build files if any diff in it's directory tree
#!/bin/bash
###############################################################################################################
# UPDATE: #
# This has been merged to a more complete example: #
# https://github.com/jaketf/ci-cd-for-data-processing-workflow/blob/master/helpers/run_relevant_cloudbuilds.sh#
###############################################################################################################
@jaketf
jaketf / lookup_side_input_with_cache.py
Last active June 9, 2020 03:18
Apache Beam Python Example: Side Input look up with cache
# Copyright 2020 Google Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,