Skip to content

Instantly share code, notes, and snippets.


Jeff Evans jeff303

View GitHub Profile
jeff303 / redact_with_specter.clj
Created Apr 21, 2021
Remove keys with the word "password" from deeply nested maps using Specter
View redact_with_specter.clj
(require '[com.rpl.specter :refer :all])
(def data {:a 1 :b 2 :c {:x 14 :my-password "foo" :y {:foo "bar" :baz-password-value "secret" :z 141}}})
;; from
(recursive-path [] p
(if-path map?
(continue-then-stay MAP-VALS p))))
jeff303 / useful_spark-shell_examples.scala
Last active Sep 18, 2020
A simple spark-shell session showing how to do useful things
View useful_spark-shell_examples.scala
// some useful imports
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
// start with some very simple JSON
val simpleJsonStr = """{"foo": 42, "bar": "baz"}"""
// just read; schema will be inferred
val simpleDf =
jeff303 /
Last active Aug 27, 2020
How to unsubscribe from Apache Spark mailing lists

The Problem

It looks like you're trying to unsubscribe from an Apache Spark mailing list. Unfortunately, you have used the main mailing list address for your request (i.e. the one that is supposed to be for actual content and discussion, as opposed to administrative requests). This is bad for a few reasons.

  1. It clutters up the archive, as these messages stick around forever.
  2. It generates an unnecessary email to many thousands of people
  3. It doesn't actually do what you want, which is unsubscribing from the mailing list.

The Solution

The good news is there are a few places where the instructions for unsubscribing are documented. You have probably come across them before, but just in case you need a refresher:

jeff303 / Field Restructuring.json
Created May 8, 2020
StreamSets Data Collector field restructuring example
View Field Restructuring.json
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 16,
"pipelineId" : "FieldRestructuring5b840130-fc26-4363-8e1d-b075a6db10a2",
"title" : "Field Restructuring",
"description" : "",
"uuid" : "23890b10-6da9-4de1-b7e0-a8c32bd3e679",
"configuration" : [ {
"name" : "executionMode",
jeff303 /
Last active Jan 22, 2020
Miscellaneous Spark utilities
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.util.ConverterUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SparkUtils {
jeff303 /
Created Aug 29, 2019
Print installed Apache Spark version to stdout
echo 'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v 'System.out.println'
jeff303 / ScalaProcessRunner.scala
Last active Apr 4, 2019
Code to check Spark version from Scala
View ScalaProcessRunner.scala
import scala.sys.process._
import scala.language.postfixOps
object ScalaProcessRunner {
def main(args: Array[String]) = {
val output = "echo System.out.println(sc.version)" #| "spark-shell" #| "grep -A2 System.out.println" #| "grep -v System.out.println" lineStream_! ProcessLogger(line => System.err.println(s"stderr: $line"))
output.foreach(line => System.out.println(s"stdout: $line"))
jeff303 /
Last active Apr 4, 2019
Capture output of process launched by ProcessBuilder
import java.util.StringJoiner;
public class ProcessBuilderRunner {
public static void main(String[] args) {
final ProcessBuilder pb = new ProcessBuilder(args);
final String output = runCommandForOutput(pb);
import java.util.Iterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configured;