Skip to content

Instantly share code, notes, and snippets.

View jeff303's full-sized avatar

Jeff Evans jeff303

View GitHub Profile
@jeff303
jeff303 / redact_with_specter.clj
Created April 21, 2021 20:11
Remove keys with the word "password" from deeply nested maps using Specter
(require '[com.rpl.specter :refer :all])
(def data {:a 1 :b 2 :c {:x 14 :my-password "foo" :y {:foo "bar" :baz-password-value "secret" :z 141}}})
;; from https://github.com/redplanetlabs/specter/wiki/Using-Specter-Recursively#recursively-navigate-to-every-map-in-a-map-of-maps
(def MAP-NODES
(recursive-path [] p
(if-path map?
(continue-then-stay MAP-VALS p))))
@jeff303
jeff303 / useful_spark-shell_examples.scala
Last active September 18, 2020 18:02
A simple spark-shell session showing how to do useful things
// some useful imports
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
// start with some very simple JSON
val simpleJsonStr = """{"foo": 42, "bar": "baz"}"""
// just read; schema will be inferred
val simpleDf = spark.read.json(Seq(simpleJsonStr).toDS())
@jeff303
jeff303 / unsubscribe_Spark_mailing_lists.md
Last active August 27, 2020 16:23
How to unsubscribe from Apache Spark mailing lists

The Problem

It looks like you're trying to unsubscribe from an Apache Spark mailing list. Unfortunately, you have used the main mailing list address for your request (i.e. the one that is supposed to be for actual content and discussion, as opposed to administrative requests). This is bad for a few reasons.

  1. It clutters up the archive, as these messages stick around forever.
  2. It generates an unnecessary email to many thousands of people
  3. It doesn't actually do what you want, which is unsubscribing from the mailing list.

The Solution

The good news is there are a few places where the instructions for unsubscribing are documented. You have probably come across them before, but just in case you need a refresher:

@jeff303
jeff303 / Field Restructuring.json
Created May 8, 2020 18:06
StreamSets Data Collector field restructuring example
{
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 16,
"pipelineId" : "FieldRestructuring5b840130-fc26-4363-8e1d-b075a6db10a2",
"title" : "Field Restructuring",
"description" : "",
"uuid" : "23890b10-6da9-4de1-b7e0-a8c32bd3e679",
"configuration" : [ {
"name" : "executionMode",
@jeff303
jeff303 / SparkUtils.java
Last active January 22, 2020 22:36
Miscellaneous Spark utilities
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.util.ConverterUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SparkUtils {
@jeff303
jeff303 / print-spark-version.sh
Created August 29, 2019 20:12
Print installed Apache Spark version to stdout
#/bin/bash
echo 'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v 'System.out.println'
@jeff303
jeff303 / ScalaProcessRunner.scala
Last active April 4, 2019 16:19
Code to check Spark version from Scala
import scala.sys.process._
import scala.language.postfixOps
object ScalaProcessRunner {
def main(args: Array[String]) = {
val output = "echo System.out.println(sc.version)" #| "spark-shell" #| "grep -A2 System.out.println" #| "grep -v System.out.println" lineStream_! ProcessLogger(line => System.err.println(s"stderr: $line"))
output.foreach(line => System.out.println(s"stdout: $line"))
}
}
@jeff303
jeff303 / ProcessBuilderRunner.java
Last active April 4, 2019 15:53
Capture output of process launched by ProcessBuilder
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.StringJoiner;
public class ProcessBuilderRunner {
public static void main(String[] args) {
final ProcessBuilder pb = new ProcessBuilder(args);
final String output = runCommandForOutput(pb);
import java.io.IOException;
import java.util.Iterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configured;