Skip to content

Instantly share code, notes, and snippets.

View metadaddy's full-sized avatar

Pat Patterson metadaddy

View GitHub Profile
@metadaddy
metadaddy / S3 to Azure Synapse.json
Last active November 7, 2019 21:54
Use StreamSets Cloud to read data from Amazon S3 and write to Azure Synapse (formerly SQL DW)
{
"ciConfig" : null,
"currentExternalCIStatus" : null,
"currentRules" : {
"commitId" : "55b9a63d-8712-4a8d-96bf-511618889159:dpmfrancois",
"definitionRemoveMessage" : null,
"definitionRemoveTime" : 0,
"definitionRemover" : null,
"id" : "d53e9e53-eb4c-4204-9227-fd40d3a1b440:dpmfrancois",
"message" : "Committed with pipeline",
{
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 12,
"pipelineId" : "personalbox4f3590a2-1b4e-40ac-beae-600e976ef496",
"title" : "personal_box",
"description" : "",
"uuid" : "043a4e85-7989-4761-90de-e68401257e9e",
"configuration" : [ {
"name" : "executionMode",
{
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 12,
"pipelineId" : "Metrics635c20ed-872c-48d7-87fb-7043ab937ae6",
"title" : "Metrics",
"description" : "",
"uuid" : "08bce59b-151b-47d4-97dc-5d4f58da9672",
"configuration" : [ {
"name" : "executionMode",
@metadaddy
metadaddy / Validate JSON Data.json
Created May 9, 2019 04:25
StreamSets Data Collector 3.8.0 pipeline to Validate JSON against a schema
{
"pipelineConfig" : {
"schemaVersion" : 6,
"version" : 12,
"pipelineId" : "ValidateJSONDatab0dfa94e-faf3-42ec-9a02-5122d048fe4c",
"title" : "Validate JSON Data",
"description" : "",
"uuid" : "d504e9d3-d4ea-4d52-8fd8-565592400d2e",
"configuration" : [ {
"name" : "executionMode",
@metadaddy
metadaddy / CustomTransformer.java
Created September 22, 2018 00:52
Creating a StreamSets Spark Transformer in Java - after third code expansion
package com.streamsets.spark;
import com.streamsets.pipeline.api.Record;
import com.streamsets.pipeline.spark.api.SparkTransformer;
import com.streamsets.pipeline.spark.api.TransformResult;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import scala.Tuple2;
@metadaddy
metadaddy / CustomTransformer.java
Created September 22, 2018 00:49
Creating a StreamSets Spark Transformer in Java - after second code expansion
package com.streamsets.spark;
import com.streamsets.pipeline.api.Record;
import com.streamsets.pipeline.spark.api.SparkTransformer;
import com.streamsets.pipeline.spark.api.TransformResult;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import scala.Tuple2;
@metadaddy
metadaddy / CustomTransformer.java
Created September 22, 2018 00:43
Creating a StreamSets Spark Transformer in Java - after first code expansion
package com.streamsets.spark;
import com.streamsets.pipeline.api.Record;
import com.streamsets.pipeline.spark.api.SparkTransformer;
import com.streamsets.pipeline.spark.api.TransformResult;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import scala.Tuple2;
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation (mmap) failed to map 3046768640 bytes for
> committing reserved memory.
> # Possible reasons:
> # The system is out of physical RAM or swap space
> # In 32 bit mode, the process size limit was hit
> # Possible solutions:
> # Reduce memory load on the system
> # Increase physical memory or swap space
#
# Copyright 2017 StreamSets Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@metadaddy
metadaddy / CustomTransformer.scala
Last active September 27, 2018 03:42
Creating a StreamSets Spark Transformer in Scala - after third code expansion
package com.streamsets.spark.scala
import com.streamsets.pipeline.api.Field
import com.streamsets.pipeline.api.Record
import com.streamsets.pipeline.spark.api.SparkTransformer
import com.streamsets.pipeline.spark.api.TransformResult
import org.apache.spark.api.java.JavaPairRDD
import org.apache.spark.api.java.JavaRDD
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.rdd.RDD