Skip to content

Instantly share code, notes, and snippets.

View slack_message_table_schema.json
{
"fields": [
{
"name": "index",
"type": "integer"
},
{
"name": "type",
"type": "string"
},
@pascalwhoop
pascalwhoop / dataflow_job.tf
Created Jul 8, 2020
sample pipeline for data cleaning
View dataflow_job.tf
import apache_beam as beam
import logging
import json
from apache_beam.io import ReadFromText
from apache_beam.io import BigQuerySource
from apache_beam.io import BigQuerySink
from apache_beam.io import WriteToText
from apache_beam.io.gcp.bigquery_tools import parse_table_schema_from_json
from apache_beam.io.gcp.internal.clients import bigquery
View dataflow_job.tf
locals {
foo_directory = "../../../../services/foo/target"
jobName = "company-${var.environment}-datalayer-foo-app-v1"
region = "europe-west1"
//parameters for the java jar
dataflow_parameters = {
runner = "DataflowRunner"
jobName = local.jobName
update = data.external.running_job.result.name == local.jobName ? true : false
project = var.project
View fix.sh
gcloud rsync -r gs://our-bucket /tmp/bucket
cd 0-bootstrap && vim backend.tf #comment out backend
terraform init #approve copying state to local
gcloud rm -r gs://our-bucket #delete all data
terraform apply #force recreates bucket in different region
gcloud rsync -r /tmp/bucket gs://our-bucket #bring state files back
gcloud rm -r gs://our-bucket/terraform/state/bootstrap #clean old state
vim backend.tf #uncomment the backend again
terraform init #copy state back to bucket
@pascalwhoop
pascalwhoop / PR-bootstrap.yaml
Created Apr 26, 2020
gcp foundation github actions
View PR-bootstrap.yaml
name: 'Bootstrap Terraform'
on:
- pull_request
env:
tf_version: 'latest'
tf_working_dir: '0-bootstrap'
GOOGLE_CREDENTIALS: ${{secrets.GOOGLE_CREDENTIALS}}
jobs:
terraform:
name: 'Terraform'
View populations.csv
We can make this file beautiful and searchable if this error is corrected: Unclosed quoted field in line 3.
"Country Name","Country Code","Indicator Name","Indicator Code","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017","2018","2019",
"Aruba","ABW","Population, total","SP.POP.TOTL","54211","55438","56225","56695","57032","57360","57715","58055","58386","58726","59063","59440","59840","60243","60528","60657","60586","60366","60103","59980","60096","60567","61345","62201","62836","63026","62644","61833","61079","61032","62149","64622","68235","72504","76700","80324","83200","85451","87277","89005","90853","92898","94992","97017","98737","100031","100834","101222","101358","101455","101669","102046","102560","103159","103774","104341","104872","105366","105845","",
"
View occurences.txt
0 161769
1 16771
2 1397
3 315
4 167
5 123
6 75
7 41
8 46
9 33
View state.json
"20200217:0.hist
"20200217:0.last": "248",
"20200217:0.ooi": "247",
"20200217:0.ooo": "1",
"20200217:0.total": "497",
"20200217:1.hist": "...................................................................................................................................................................................................................................................................................................................................................................................
View both_runs.log
[pascalwhoop@pascalwhoop-xps example]$ cd a
[pascalwhoop@pascalwhoop-xps a]$ terraform init
Initializing modules...
Initializing the backend...
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
@pascalwhoop
pascalwhoop / cleanup.sh
Created Nov 20, 2019
Cleans the Kubernetes API of all pods in a namespace (in our case airflow)
View cleanup.sh
NAMESPACE=airflow
#edit the > 3 < in the jq statement to correspond to the number of days you want to keep pods around for
kubectl delete pod -n $NAMESPACE $(kubectl get pods -n $NAMESPACE -o json | jq -r '.items[] |
select(.status.phase != "Running") |
select(last(.status.containerStatuses)[].state | has("terminated")) |
select((last(.status.containerStatuses)[].state.terminated.finishedAt | fromdate) < (now - 60*60*24*3)) |
.metadata.name')