Skip to content

Instantly share code, notes, and snippets.

@polleyg
polleyg / beam_sql_part_2.java
Created January 23, 2019 06:37
beam_sql_part_2
[..]
public static final Schema SCHEMA = Schema.builder()
.addStringField("lang")
.addInt32Field("views")
.build();
[..]
.apply("transform_to_row", ParDo.of(new RowParDo())).setRowSchema(SCHEMA)
[..]
//ParDo for String -> Row (SQL)
public static class RowParDo extends DoFn<String, Row> {
@polleyg
polleyg / beam_sql_part1.java
Last active January 24, 2019 03:00
beam_sql_part1
[..]
PipelineOptionsFactory.register(DataflowPipelineOptions.class);
DataflowPipelineOptions options = PipelineOptionsFactory
.fromArgs(args)
.withValidation()
.as(DataflowPipelineOptions.class);
Pipeline pipeline = Pipeline.create(options);
pipeline.apply("read_from_gcs", TextIO.read().from("gs://batch-pipeline-sql/input/*"))
[..]
@polleyg
polleyg / SO_54226149.md
Last active January 17, 2019 11:14
SO_54226149.md

image image

@polleyg
polleyg / pull_the_trigger.sh
Created October 19, 2018 05:28
Pull the trigger
gcloud builds submit --config=cloudbuild.yaml .
@polleyg
polleyg / cloudbuild.yaml
Created October 19, 2018 03:18
Cloud Build file for copy BQ tables using Dataflow
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/polleyg/gcp-dataflow-copy-bigquery.git']
- name: gcr.io/cloud-builders/gradle
args: ['build', 'run']
@polleyg
polleyg / DataflowCopyBQ_part_1.java
Last active June 29, 2019 10:41
This code works out the location of the buckets and also the storage class
//imports & doc omitted for brevity. See repo for full source file.
//https://github.com/polleyg/gcp-dataflow-copy-bigquery/blob/master/src/main/java/org/polleyg/BQTableCopyPipeline.java
public class BQTableCopyPipeline {
private static final Logger LOG = LoggerFactory.getLogger(BQTableCopyPipeline.class);
private static final String DEFAULT_NUM_WORKERS = "1";
private static final String DEFAULT_MAX_WORKERS = "3";
private static final String DEFAULT_TYPE_WORKERS = "n1-standard-1";
private static final String DEFAULT_ZONE = "australia-southeast1-a";
private static final String DEFAULT_WRITE_DISPOSITION = "truncate";
private static final String DEFAULT_DETECT_SCHEMA = "true";
@polleyg
polleyg / config.yaml
Last active November 7, 2018 23:20
YAML config for reading some BigQuery tables
# [required] The GCP project id (not the number). You can find this in the GCP console.
project: grey-sort-challenge
# [required] The type of runner. One of:
# - dataflow (runs on GCP)
# - local (runs on local machine)
runner: dataflow
# The actual tables to copy. Options:
#
@polleyg
polleyg / build_output.log
Created September 30, 2018 12:22
Log of the build
SVN-18-148:gcp-tweets-streaming-pipeline grahampolley$ gcloud builds submit --config=cloudbuild.yaml .
Creating temporary tarball archive of 15 file(s) totalling 77.5 KiB before compression.
Some files were not included in the source upload.
Check the gcloud log [/Users/grahampolley/.config/gcloud/logs/2018.09.30/22.13.22.932440.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).
Uploading tarball of [.] to [gs://grey-sort-challenge_cloudbuild/source/1538309603.86-62473cec2d1f41a69edff2d7304b48e2.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/grey-sort-challenge/builds/81befc56-b3b6-4377-ae77-a2e7a30301b6].
@polleyg
polleyg / build_and_deploy.sh
Created September 30, 2018 12:18
Cloud Build command for build and deploy
gcloud builds submit --config=cloudbuild.yaml .
@polleyg
polleyg / cloudbuild.yaml
Created September 30, 2018 12:16
Config for building and deploying this app
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/polleyg/gcp-tweets-streaming-pipeline.git']
- name: gcr.io/cloud-builders/gcloud
args: ['app', 'deploy', '--version=tweets']
dir: 'twitter-to-pubsub'
- name: gcr.io/cloud-builders/gradle
args: ['build', 'run']