Skip to content

Instantly share code, notes, and snippets.

View johntbush's full-sized avatar

john bush johntbush

View GitHub Profile
@johntbush
johntbush / gist:fb21ca23f4fad44bb0227fcfc39513f8
Last active June 21, 2022 16:58
how to communicate in IT

How to communicate like a champ in the IT world

learn when and how to use these phrases

  • herding cats
  • is the juice worth the squeeze
  • when you got a hammer everything looks like a nail
  • that's a solution looking for a problem
  • works on my machine!
  • two birds with one stone

Apache Beam pipelines

Experiments beyond Java to create pipelines that are semantically more familiar to sql developers, functional programmers, and others with big data backgrounds.

The dream is we can make pipelines in less time and make them easier to read. This will bring value faster and lower our maintenance costs in the long run.

The best way to explain this is with an example. We take a simple made up model of orders and refunds. An order can have 0 to N refunds. A customer can have 0 to N orders. We want to total the amount a

@johntbush
johntbush / ordersum.scala
Created August 13, 2019 02:27
scala beam sample
package example.scala
import com.spotify.scio._
import com.spotify.scio.extra.json._
case class Orders(orders: List[Order])
case class Order(order_id:String, customer_id:String, order_amt:Long)
case class Refunds(refunds: List[Refund])
case class Refund(refund_order_id:String, original_order_id:String, customer_id:String,
@johntbush
johntbush / ordersum.java
Created August 13, 2019 02:25
beam java sample
package example.java;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.collect.Iterables;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Combine;
import org.apache.beam.sdk.transforms.DoFn;
@johntbush
johntbush / gist:9313269a9482d037136439cab34c54c6
Last active March 26, 2019 18:36
account provisionning

Exisiting problems:

  • the way StackGroups are laid out there is a lot of copy and past repetition between envs and regions. This is cumbersome and prone to mistakes (Let's get DRY)
  • certain infrastructure changes need to go out with every deployment because they are rapidly changing things or dependencies for code changes. This often get missed right now causing churn.
  • we need to be able to have our CI servers automate all provisioning as the prod creds aren't known broadly (in prod you can't just run sceptre from your machine, it has to be intitiated from CI builds)

ckp-aws-deploy

templates (no changes)

account_provisioning/sceptre/templates

#install the cli
https://docs.aws.amazon.com/cli/latest/userguide/installing.html
## for windows
https://docs.aws.amazon.com/cli/latest/userguide/awscli-install-windows.html
# configure your keys
You should have received an access key and a secret key
spark-submit
--class com.mycompany.Job
--deploy-mode cluster
--master yarn
--conf spark.yarn.submit.waitAppCompletion=false
--driver-memory 4g
--num-executors 4
--executor-memory 2g
--executor-cores 5
s3://mycompany/artifact.jar
com.trax.platform.fps.auditreview:audit-review-recon:jar:1.0-SNAPSHOT
+- com.trax.platform:trax-platform-utils:jar:1.3.40:compile
| +- com.fasterxml.jackson.core:jackson-core:jar:2.4.4:compile
| +- com.fasterxml.jackson.core:jackson-databind:jar:2.4.4:compile
| | \- com.fasterxml.jackson.core:jackson-annotations:jar:2.4.0:compile
| +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.4.4:compile
| | +- com.thoughtworks.paranamer:paranamer:jar:2.6:compile
| | \- com.google.code.findbugs:jsr305:jar:2.0.1:compile
| +- nl.grons:metrics-scala_2.11:jar:3.5.1:compile
| | \- io.dropwizard.metrics:metrics-healthchecks:jar:3.1.2:compile
December 2017 Freight Bills with changes
Time to extract 16183567 bills from TraxDW - 22 minutes
found and grouped 16183567 records in 12 ms
+---------+----------------+
|OWNER_KEY|count(OWNER_KEY)|
+---------+----------------+
| 000-1001| 1|
select count(*) as payr_dtl from dw.paYR_DTL; -- 171,280,225
select count(*) from dw.remitDTL; --392,726,040
select count(*) from dw.frght_bl; --408,302,966
select count(*) from dw.exceptions; -- 201,285,497
select count(*) from dw.invoice; -- 598,03,644
select count(*) from dw.fb_ln; --1,226,455,963
select count(*) from dw.frghtBlMaster; --686,564,991
select count(*) from dw.ca_ELEM; --554,904,231
select count(*) from dw.veNDOR_REMIT; --9697
select count(*) from dw.veNDOR; --8379