Skip to content

Instantly share code, notes, and snippets.

View jeff303's full-sized avatar

Jeff Evans jeff303

View GitHub Profile
jeff303 / redact_with_specter.clj
Created April 21, 2021 20:11
Remove keys with the word "password" from deeply nested maps using Specter
(require '[com.rpl.specter :refer :all])
(def data {:a 1 :b 2 :c {:x 14 :my-password "foo" :y {:foo "bar" :baz-password-value "secret" :z 141}}})
;; from
(recursive-path [] p
(if-path map?
(continue-then-stay MAP-VALS p))))
jeff303 / useful_spark-shell_examples.scala
Last active September 18, 2020 18:02
A simple spark-shell session showing how to do useful things
// some useful imports
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
// start with some very simple JSON
val simpleJsonStr = """{"foo": 42, "bar": "baz"}"""
// just read; schema will be inferred
val simpleDf =
jeff303 /
Last active August 27, 2020 16:23
How to unsubscribe from Apache Spark mailing lists

The Problem

It looks like you're trying to unsubscribe from an Apache Spark mailing list. Unfortunately, you have used the main mailing list address for your request (i.e. the one that is supposed to be for actual content and discussion, as opposed to administrative requests). This is bad for a few reasons.

  1. It clutters up the archive, as these messages stick around forever.
  2. It generates an unnecessary email to many thousands of people
  3. It doesn't actually do what you want, which is unsubscribing from the mailing list.

The Solution

The good news is there are a few places where the instructions for unsubscribing are documented. You have probably come across them before, but just in case you need a refresher:

jeff303 / Field Restructuring.json
Created May 8, 2020 18:06
StreamSets Data Collector field restructuring example
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 16,
"pipelineId" : "FieldRestructuring5b840130-fc26-4363-8e1d-b075a6db10a2",
"title" : "Field Restructuring",
"description" : "",
"uuid" : "23890b10-6da9-4de1-b7e0-a8c32bd3e679",
"configuration" : [ {
"name" : "executionMode",
jeff303 /
Last active January 22, 2020 22:36
Miscellaneous Spark utilities
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.util.ConverterUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SparkUtils {
jeff303 /
Created August 29, 2019 20:12
Print installed Apache Spark version to stdout
echo 'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v 'System.out.println'
jeff303 / ScalaProcessRunner.scala
Last active April 4, 2019 16:19
Code to check Spark version from Scala
import scala.sys.process._
import scala.language.postfixOps
object ScalaProcessRunner {
def main(args: Array[String]) = {
val output = "echo System.out.println(sc.version)" #| "spark-shell" #| "grep -A2 System.out.println" #| "grep -v System.out.println" lineStream_! ProcessLogger(line => System.err.println(s"stderr: $line"))
output.foreach(line => System.out.println(s"stdout: $line"))
jeff303 /
Last active April 4, 2019 15:53
Capture output of process launched by ProcessBuilder
import java.util.StringJoiner;
public class ProcessBuilderRunner {
public static void main(String[] args) {
final ProcessBuilder pb = new ProcessBuilder(args);
final String output = runCommandForOutput(pb);
import java.util.Iterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configured;