Skip to content

Instantly share code, notes, and snippets.

View holdenk's full-sized avatar

Holden Karau holdenk

View GitHub Profile

Please publicly post the following Gist, and name it keybase.md

Keybase proof

I hereby claim:

  • I am holdenk on github.
  • I am holden (https://keybase.io/holden) on keybase.
  • I have a public key ASAn6L1PbB8-ZtgGZx6XAMfgLZR_r0s7K2W5RHwIZQ77jQo
#!/bin/bash
export TESTBUCKET=boo-test-sketchy
gsutil rm gs://$TESTBUCKET/output.txt
set -x
echo "Starting job server"
java -cp runners/flink/build/libs/flink-deploy.jar org.apache.beam.runners.flink.FlinkJobServerDriver --job-host localhost:3000 &> logs &
echo $! &> jobserver.pid
echo "Making the output bucket world writeable, because fuck credentials are hard"
gsutil acl ch -u AllUsers:W gs://$TESTBUCKET
pushd ./sdks/go/examples/build/bin/linux_amd64/
@holdenk
holdenk / go_demo.sh
Created April 10, 2018 18:00
Run Beam Go Demo
#!/bin/bash
export TESTBUCKET=YOURTESTBUCKETGOESHEREEH
set -x
echo "Starting job server"
java -cp runners/flink/build/libs/flink-deploy.jar org.apache.beam.runners.flink.FlinkJobServerDriver --job-host localhost:3000 &> logs &
echo $! &> jobserver.pid
echo "Making the output bucket world writeable, because fuck credentials are hard"
gsutil acl ch -u AllUsers:W gs://$TESTBUCKET
pushd ./sdks/go/examples/build/bin/linux_amd64/
./wordcount --input gs://apache-beam-samples/shakespeare/kinglear.txt --output gs://$TESTBUCKET/output.txt --runner=flink --endpoint=localhost:3000 --worker_binary ./wordcount
@holdenk
holdenk / 2.1.2 rc2 h2.7 bl
Created September 29, 2017 15:12
2.1.2 rc2 h2.7 build log
Spark version is 2.1.2
Making spark-2.1.2-bin-hadoop2.7.tgz
Building with...
$ /home/holden/Downloads/apache-maven-3.3.9/bin/mvn -T 1C clean package -DskipTests -Phadoop-2.7 -Phadoop-2.6 -Phadoop-2.4 -Phadoop-2.3 -Psparkr -Phive -Phive-thriftserver -Pyarn -Pmesos -DzincPort=3036
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.spark:spark-sketch_2.11:jar:2.1.2
[WARNING] 'parent.relativePath' of POM org.apache.spark:spark-parent_2.11:2.1.2 (/home/holden/repos/spark/spark-2.1.2-bin-hadoop2.7/pom.xml) points at org.apache.spark:spark-parent_2.11 instead of org.apache:apache, please verify your project structure @ org.apache.spark:spark-parent_2.11:2.1.2, /home/holden/repos/spark/spark-2.1.2-bin-hadoop2.7/pom.xml, line 22, column 11
scala> val df =spark.read.format("csv").option("header", "false").option("inferSchema", "true").load("/home/holden/Downloads/ex*.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 2125 more fields]
scala> df.collect()
16/07/25 12:53:40 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
res9: Array[org.apache.spark.sql.Row] = Array([Date,Lifetime Total Likes,Daily New Likes,Daily Unlikes,Daily Page Engaged Users,Weekly Page Engaged Users,28 Days Page Engaged Users,Daily Like Sources - On Your Page,Daily Total Reach,Weekly Total Reach,28 Days Total Reach,Daily Organic Reach,Weekly Organic Reach,28 Days Organic Reach,Daily Total Impressions,Weekly Total Impressions,28 Days Total Impressions,Daily Organic impressions,Weekly Organic impressions,28 Days Organic impressions,Daily Reach of page posts,Weekly Reach of page posts,28 Days Reach of page posts,Daily Organic Reach

Keybase proof

I hereby claim:

  • I am holdenk on github.
  • I am holden (https://keybase.io/holden) on keybase.
  • I have a public key whose fingerprint is 3500 21E4 501A 7823 1AB0 F73C C810 79AC 1BAE 73FC

To claim this, I am signing this object:

@holdenk
holdenk / gist:2362150
Created April 11, 2012 20:16
a venue ES index
{"name":"Sunglass Hut","aliases":"sunglass hut","tags":"","id":"4bdf5019e75c0f4758afca03","category_string":"Shops & Services:Clothing Stores:Accessories Stores","text":"Sunglass Hut sunglass hut Shops & Services:Clothing Stores:Accessories Stores","userid":119128,"mayorid":0,"checkinGeoS2CellIds":"","checkinInfo":"{\"checkins\":[]}","dtzone":"America/Los_Angeles","geomobile":false,"address":"","city":"","country":"","categoryIds":"4d4b7105d754a06378d81259 4bf58dd8d48988d103951735 4bf58dd8d48988d102951735","categoryId0":"4bf58dd8d48988d102951735","categoryId1":"","categoryId2":"","metaCategories":"Shops All","popularity":15,"decayedPopularity1":3.2959330147373876E-12,"partitionedPopularity":"0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0","neighborhoodIds":"2149 31055","geoS2CellIds":"80858088D1D00000 80858088D1C00000 80858088D1000000 80858088D4000000 80858088D0000000 80858088C0000000 8085808900000000 8085808C00000000 8085809000000000 808580C000000000 808581000
@holdenk
holdenk / gist:2361952
Created April 11, 2012 19:53
evenue query
"custom_score" : {
"query" : {
"bool" : {
"should" : [ {
"query_string" : {
"query" : "\"808F7E2050000000\"",
"fields" : [ "geoS2CellIds^1.0" ]
}
}, {
"query_string" : {
@holdenk
holdenk / ze error
Created March 17, 2012 02:52
Clearly I don't understand twitter futures correctly, why doesn't this work?
@Test
def zeFutures {
val executor = Executors.newCachedThreadPool()
val esfp = FuturePool(executor)
val future : Future[Int]= esfp({
println("bots")
Thread.sleep(100)
println("have feelings")
1
})