Skip to content

Instantly share code, notes, and snippets.

Avatar

Jeff Evans jeff303

View GitHub Profile
@jeff303
jeff303 / redact_with_specter.clj
Created Apr 21, 2021
Remove keys with the word "password" from deeply nested maps using Specter
View redact_with_specter.clj
(require '[com.rpl.specter :refer :all])
(def data {:a 1 :b 2 :c {:x 14 :my-password "foo" :y {:foo "bar" :baz-password-value "secret" :z 141}}})
;; from https://github.com/redplanetlabs/specter/wiki/Using-Specter-Recursively#recursively-navigate-to-every-map-in-a-map-of-maps
(def MAP-NODES
(recursive-path [] p
(if-path map?
(continue-then-stay MAP-VALS p))))
@jeff303
jeff303 / useful_spark-shell_examples.scala
Last active Sep 18, 2020
A simple spark-shell session showing how to do useful things
View useful_spark-shell_examples.scala
// some useful imports
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
// start with some very simple JSON
val simpleJsonStr = """{"foo": 42, "bar": "baz"}"""
// just read; schema will be inferred
val simpleDf = spark.read.json(Seq(simpleJsonStr).toDS())
@jeff303
jeff303 / unsubscribe_Spark_mailing_lists.md
Last active Aug 27, 2020
How to unsubscribe from Apache Spark mailing lists
View unsubscribe_Spark_mailing_lists.md

The Problem

It looks like you're trying to unsubscribe from an Apache Spark mailing list. Unfortunately, you have used the main mailing list address for your request (i.e. the one that is supposed to be for actual content and discussion, as opposed to administrative requests). This is bad for a few reasons.

  1. It clutters up the archive, as these messages stick around forever.
  2. It generates an unnecessary email to many thousands of people
  3. It doesn't actually do what you want, which is unsubscribing from the mailing list.

The Solution

The good news is there are a few places where the instructions for unsubscribing are documented. You have probably come across them before, but just in case you need a refresher:

@jeff303
jeff303 / Field Restructuring.json
Created May 8, 2020
StreamSets Data Collector field restructuring example
View Field Restructuring.json
{
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 16,
"pipelineId" : "FieldRestructuring5b840130-fc26-4363-8e1d-b075a6db10a2",
"title" : "Field Restructuring",
"description" : "",
"uuid" : "23890b10-6da9-4de1-b7e0-a8c32bd3e679",
"configuration" : [ {
"name" : "executionMode",
@jeff303
jeff303 / SparkUtils.java
Last active Jan 22, 2020
Miscellaneous Spark utilities
View SparkUtils.java
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.util.ConverterUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class SparkUtils {
@jeff303
jeff303 / print-spark-version.sh
Created Aug 29, 2019
Print installed Apache Spark version to stdout
View print-spark-version.sh
#/bin/bash
echo 'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v 'System.out.println'
@jeff303
jeff303 / ScalaProcessRunner.scala
Last active Apr 4, 2019
Code to check Spark version from Scala
View ScalaProcessRunner.scala
import scala.sys.process._
import scala.language.postfixOps
object ScalaProcessRunner {
def main(args: Array[String]) = {
val output = "echo System.out.println(sc.version)" #| "spark-shell" #| "grep -A2 System.out.println" #| "grep -v System.out.println" lineStream_! ProcessLogger(line => System.err.println(s"stderr: $line"))
output.foreach(line => System.out.println(s"stdout: $line"))
}
}
@jeff303
jeff303 / ProcessBuilderRunner.java
Last active Apr 4, 2019
Capture output of process launched by ProcessBuilder
View ProcessBuilderRunner.java
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.StringJoiner;
public class ProcessBuilderRunner {
public static void main(String[] args) {
final ProcessBuilder pb = new ProcessBuilder(args);
final String output = runCommandForOutput(pb);
View Tutorial1.java
import java.io.IOException;
import java.util.Iterator;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configured;