Skip to content

Instantly share code, notes, and snippets.

@mattyb149
mattyb149 / CDC_Replication.xml
Created March 23, 2017 23:14
A test script for the GetChangeDataCaptureMySQL NiFi processor, taking CDC (binlog) events and transforming them to target SQL
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description></description>
<groupId>faf788c5-015a-1000-f344-de24ceb9d7e7</groupId>
<name>CDC_Replication</name>
<snippet>
<connections>
<id>d21bc8ee-015a-1000-0000-000000000000</id>
<parentGroupId>faf788c5-015a-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@mattyb149
mattyb149 / binlog_connector.groovy
Last active May 18, 2017 15:44
A Groovy script to dump MySQL binlog events to the console
@Grab('com.github.shyiko:mysql-binlog-connector-java:0.8.1')
import com.github.shyiko.mysql.binlog.event.*
import com.github.shyiko.mysql.binlog.*
args = ['192.168.99.100', '32768', 'root', 'K3sdchkm'] as String[]
if(!args || args.length < 3) {
println 'Usage: groovy binlog_connector.groovy <host> <port> <username> [<password>]'
return 1
}
def client = new BinaryLogClient(args[0], Integer.parseInt(args[1]), args[2], args.length == 4 ? args[3] : '')
def recordCount = 0
@mattyb149
mattyb149 / GenerateTableFetchExample.xml
Created January 2, 2017 17:43
NiFi Template for using GenerateTable fetch with a Remote Process Group to do parallel fetch with ExecuteSQL
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template provides a pattern for using GenerateTableFetch on the primary node
to generate multiple flow files, each one containing a SQL query to be executed in parallel by a cluster.
The flow files are transported using a Remote Process Group back to the same cluster, where they
can be executed in parallel by the ExecuteSQL processor. To increase parallelism, you can add more nodes
to the cluster. To increase concurrency, you can increase the number of
concurrent tasks for each ExecuteSQL instance.</description>
<groupId>03a0dc51-0159-1000-87c0-b1527877d72e</groupId>
<name>GenerateTableFetchExample</name>
@mattyb149
mattyb149 / DatabaseLookupExample.xml
Created December 14, 2016 16:49
NiFi Template showing how to populate a Map from a DB table, and how to use the Map for lookups
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template illustrates how to populate a DistributedCacheMapServer with values from a RDBMS, and how to use those values as a lookup for incoming flow files.</description>
<groupId>f962a447-0158-1000-3c38-bc764f9c916d</groupId>
<name>DatabaseLookupExample</name>
<snippet>
<processGroups>
<id>fe2190e7-0158-1000-0000-000000000000</id>
<parentGroupId>f962a447-0158-1000-0000-000000000000</parentGroupId>
<position>
@mattyb149
mattyb149 / FetchOnFileExists.xml
Created December 12, 2016 14:35
NiFi template to check for the existence of a file, then transfers all flow files to success
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template uses ExecuteScript to check for the existence of a file, then transfers all flow files to success while the file exists. This can be used to start and stop the movement of files external to NiFi.</description>
<groupId>f35864f5-0158-1000-f261-0c18ddb7fb1b</groupId>
<name>FetchOnFileExists</name>
<snippet>
<connections>
<id>f3597301-0158-1000-0000-000000000000</id>
<parentGroupId>f35864f5-0158-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@mattyb149
mattyb149 / AttributeMapping.xml
Created November 21, 2016 14:53
Template w Groovy script to change attribute values based on a given mapping
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template includes a Groovy script to change attribute values based on a given mapping of incoming values to outgoing values.</description>
<groupId>79013fce-0158-1000-02dc-db4db4159c44</groupId>
<name>AttributeMapping</name>
<snippet>
<connections>
<id>875595ac-0158-1000-0000-000000000000</id>
<parentGroupId>79013fce-0158-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@mattyb149
mattyb149 / Elasticsearch_content_from_search_results.xml
Created October 26, 2016 17:14
An Apache NiFi template showing how to use InvokeHttp and the Elasticsearch processors to query, fetch, and process ES docs
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><template><description></description><name>Elasticsearch_content_from_search_results</name><snippet><connections><id>44c1ba1c-f3dd-4f60-8ae7-8e88931bfca5</id><parentGroupId>3101a656-8265-42ce-95e7-0881c817da32</parentGroupId><backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold><backPressureObjectThreshold>0</backPressureObjectThreshold><destination><groupId>3101a656-8265-42ce-95e7-0881c817da32</groupId><id>d74ccaf6-3846-4007-8d86-33d6f8985b5b</id><type>PROCESSOR</type></destination><flowFileExpiration>0 sec</flowFileExpiration><labelIndex>1</labelIndex><name></name><selectedRelationships>success</selectedRelationships><source><groupId>3101a656-8265-42ce-95e7-0881c817da32</groupId><id>88a38e8c-5fa1-4463-9996-cfe2a6b26bec</id><type>PROCESSOR</type></source><zIndex>0</zIndex></connections><connections><id>dafff567-98be-4644-b14a-091c6cd8753d</id><parentGroupId>3101a656-8265-42ce-95e7-0881c817da32</parentGroupId><backPressureDataSizeThreshold>0
@mattyb149
mattyb149 / LookupFilter.xml
Created October 10, 2016 16:50
Template to look up table names from a file and match them against ListDatabaseTables, to filter which tables to send to ExecuteSQL
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template uses ExecuteScript and Groovy to read in (from a file) a list of tables to fetch, then if the incoming table name (from ListDatabaseTables) is in the list of tables to fetch, sends the flow file to success (for use by ExecuteSQL) or failure</description>
<groupId>af725e75-0157-1000-3844-d085884a56db</groupId>
<name>LookupFilter</name>
<snippet>
<connections>
<id>af73282a-0157-1000-0000-000000000000</id>
<parentGroupId>af725e75-0157-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@mattyb149
mattyb149 / ConvertCSVtoCQL.xml
Created August 29, 2016 15:40
Apache NiFi 1.0 template to convert CSV files to Cassandra Query Language (SQL) statements and execute them
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template describes a flow where a CSV file (whose filename and content) contributes to the fields in a Cassandra table is processed, then CQL statements are constructed and executed.</description>
<groupId>d6aa94a0-0156-1000-71a4-a96b6da4672f</groupId>
<name>ConvertCSVtoCQL</name>
<snippet>
<connections>
<id>b7b54e60-d92f-4bb1-0000-000000000000</id>
<parentGroupId>d6aa94a0-0156-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold>
@mattyb149
mattyb149 / ExecuteScriptRemoteCommands.xml
Created August 10, 2016 19:18
An Apache NiFi template using ExecuteScript with Groovy and Sshoogr to execute commands on a remote node
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description>This template shows how to use ExecuteScript with Groovy and Sshoogr to execute commands (from an incoming flow file) on a remote system (via SSH).
To use, replace the values in ExecuteScript for your remote system (sshHostname, sshUsername, sshPassword, sshPort), and replace the GenerateFlowFile processor (and the associated ReplaceText processor) with your flow that produces files with lines of commands to run remotely.</description>
<groupId>71f9bd06-0156-1000-783e-41e29ccff3d8</groupId>
<name>ExecuteScriptRemoteCommands</name>
<snippet>
<connections>
<id>7492dda1-0156-1000-0000-000000000000</id>