Skip to content

Instantly share code, notes, and snippets.

@ijokarumawak
Last active February 10, 2018 03:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd to your computer and use it in GitHub Desktop.
Save ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd to your computer and use it in GitHub Desktop.
NIFI-4866 HBasePut performance test

NIFI-4866: HBase_1_1_2_ClientService performance dropped due to accessing HBase admin upon every FlowFile transfer

NIFI-4543 added "connection.getAdmin().getClusterStatus().getMaster().getHostAndPort()” call from HBase_1_1_2_ClientService.toTransitUri method in order to get HBase master address, which is called upon every FlowFile transfer and essentially doubles the traffic through the HBase connector. The performance of our PutHBaseJSON processor dropped to 1/3 after deploying NiFi 1.5.0. Those lines of code should be moved from toTransitUri to createConnection.

Work around

For those who uses NiFi 1.5.0 and impacted by the performance degration, the patch needs to be manually applied. By cherrypicking the proposed fix on top of NiFi 1.5.0 released branch.

I have done it myself and and this branch has the fix on top of NiFi 1.5.0. https://github.com/ijokarumawak/nifi/tree/nifi-1.5.0_nifi-4866

For those who are not familiar with Git command and building NiFi NAR files from source code using maven, I've attached an updated nar file here. It is built from the branch above, nifi-1.5.0_nifi-4866.

Disclaimer : Following custom nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar is NOT an official binary released by Apache Software Foundation. Use it at your own risk: https://gist.github.com/ijokarumawak/85db60ca71f1825f543c18c62bf7c3fd/raw/8687a420a24604ad26da94f49a193c406e75a203/nifi-hbase_1_1_2-client-service-nar-1.5.0_nifi-4866.nar

Without applying the patch:

With the patch:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>7d6e8840-0161-1000-3135-187e19ec893e</groupId>
<name>nifi-4866</name>
<snippet>
<connections>
<id>886f875d-d4b3-3ab5-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<destination>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>9707806b-4dcb-3ea7-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<name></name>
<selectedRelationships>success</selectedRelationships>
<source>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>ff250d5f-c7d8-357d-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<connections>
<id>afc42e9f-b65e-3726-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<destination>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>ff250d5f-c7d8-357d-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<name></name>
<selectedRelationships>success</selectedRelationships>
<source>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>f797a34a-ae5e-3572-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<connections>
<id>be156c34-e9f3-3f61-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
<backPressureObjectThreshold>10000</backPressureObjectThreshold>
<bends>
<x>0.0</x>
<y>481.1903533935547</y>
</bends>
<destination>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>9707806b-4dcb-3ea7-0000-000000000000</id>
<type>PROCESSOR</type>
</destination>
<flowFileExpiration>0 sec</flowFileExpiration>
<labelIndex>1</labelIndex>
<name></name>
<selectedRelationships>failure</selectedRelationships>
<source>
<groupId>11639ff4-20f4-3488-0000-000000000000</groupId>
<id>ff250d5f-c7d8-357d-0000-000000000000</id>
<type>PROCESSOR</type>
</source>
<zIndex>0</zIndex>
</connections>
<controllerServices>
<id>16807846-3484-3619-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<bundle>
<artifact>nifi-hbase_1_1_2-client-service-nar</artifact>
<group>org.apache.nifi</group>
<version>1.6.0-SNAPSHOT</version>
</bundle>
<comments></comments>
<descriptors>
<entry>
<key>Hadoop Configuration Files</key>
<value>
<name>Hadoop Configuration Files</name>
</value>
</entry>
<entry>
<key>Kerberos Principal</key>
<value>
<name>Kerberos Principal</name>
</value>
</entry>
<entry>
<key>Kerberos Keytab</key>
<value>
<name>Kerberos Keytab</name>
</value>
</entry>
<entry>
<key>ZooKeeper Quorum</key>
<value>
<name>ZooKeeper Quorum</name>
</value>
</entry>
<entry>
<key>ZooKeeper Client Port</key>
<value>
<name>ZooKeeper Client Port</name>
</value>
</entry>
<entry>
<key>ZooKeeper ZNode Parent</key>
<value>
<name>ZooKeeper ZNode Parent</name>
</value>
</entry>
<entry>
<key>HBase Client Retries</key>
<value>
<name>HBase Client Retries</name>
</value>
</entry>
<entry>
<key>Phoenix Client JAR Location</key>
<value>
<name>Phoenix Client JAR Location</name>
</value>
</entry>
</descriptors>
<name>HBase_1_1_2_ClientService 1.6</name>
<persistsState>false</persistsState>
<properties>
<entry>
<key>Hadoop Configuration Files</key>
<value>/Users/koji/dev/nifi-4866/core-site.xml,/Users/koji/dev/nifi-4866/hbase-site.xml</value>
</entry>
<entry>
<key>Kerberos Principal</key>
</entry>
<entry>
<key>Kerberos Keytab</key>
</entry>
<entry>
<key>ZooKeeper Quorum</key>
</entry>
<entry>
<key>ZooKeeper Client Port</key>
</entry>
<entry>
<key>ZooKeeper ZNode Parent</key>
</entry>
<entry>
<key>HBase Client Retries</key>
<value>1</value>
</entry>
<entry>
<key>Phoenix Client JAR Location</key>
</entry>
</properties>
<state>ENABLED</state>
<type>org.apache.nifi.hbase.HBase_1_1_2_ClientService</type>
</controllerServices>
<processors>
<id>9707806b-4dcb-3ea7-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<position>
<x>28.0</x>
<y>569.1903533935547</y>
</position>
<bundle>
<artifact>nifi-standard-nar</artifact>
<group>org.apache.nifi</group>
<version>1.6.0-SNAPSHOT</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>Log Level</key>
<value>
<name>Log Level</name>
</value>
</entry>
<entry>
<key>Log Payload</key>
<value>
<name>Log Payload</name>
</value>
</entry>
<entry>
<key>Attributes to Log</key>
<value>
<name>Attributes to Log</name>
</value>
</entry>
<entry>
<key>attributes-to-log-regex</key>
<value>
<name>attributes-to-log-regex</name>
</value>
</entry>
<entry>
<key>Attributes to Ignore</key>
<value>
<name>Attributes to Ignore</name>
</value>
</entry>
<entry>
<key>attributes-to-ignore-regex</key>
<value>
<name>attributes-to-ignore-regex</name>
</value>
</entry>
<entry>
<key>Log prefix</key>
<value>
<name>Log prefix</name>
</value>
</entry>
<entry>
<key>character-set</key>
<value>
<name>character-set</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>Log Level</key>
<value>info</value>
</entry>
<entry>
<key>Log Payload</key>
<value>false</value>
</entry>
<entry>
<key>Attributes to Log</key>
</entry>
<entry>
<key>attributes-to-log-regex</key>
<value>.*</value>
</entry>
<entry>
<key>Attributes to Ignore</key>
</entry>
<entry>
<key>attributes-to-ignore-regex</key>
</entry>
<entry>
<key>Log prefix</key>
</entry>
<entry>
<key>character-set</key>
<value>UTF-8</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<name>LogAttribute</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>success</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.standard.LogAttribute</type>
</processors>
<processors>
<id>f797a34a-ae5e-3572-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<position>
<x>23.2034912109375</x>
<y>0.0</y>
</position>
<bundle>
<artifact>nifi-standard-nar</artifact>
<group>org.apache.nifi</group>
<version>1.6.0-SNAPSHOT</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>File Size</key>
<value>
<name>File Size</name>
</value>
</entry>
<entry>
<key>Batch Size</key>
<value>
<name>Batch Size</name>
</value>
</entry>
<entry>
<key>Data Format</key>
<value>
<name>Data Format</name>
</value>
</entry>
<entry>
<key>Unique FlowFiles</key>
<value>
<name>Unique FlowFiles</name>
</value>
</entry>
<entry>
<key>generate-ff-custom-text</key>
<value>
<name>generate-ff-custom-text</name>
</value>
</entry>
<entry>
<key>character-set</key>
<value>
<name>character-set</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>File Size</key>
<value>0B</value>
</entry>
<entry>
<key>Batch Size</key>
<value>1000</value>
</entry>
<entry>
<key>Data Format</key>
<value>Text</value>
</entry>
<entry>
<key>Unique FlowFiles</key>
<value>false</value>
</entry>
<entry>
<key>generate-ff-custom-text</key>
<value>test</value>
</entry>
<entry>
<key>character-set</key>
<value>UTF-8</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0s</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<name>GenerateFlowFile</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>success</name>
</relationships>
<state>STOPPED</state>
<style/>
<type>org.apache.nifi.processors.standard.GenerateFlowFile</type>
</processors>
<processors>
<id>ff250d5f-c7d8-357d-0000-000000000000</id>
<parentGroupId>11639ff4-20f4-3488-0000-000000000000</parentGroupId>
<position>
<x>22.0</x>
<y>263.1903533935547</y>
</position>
<bundle>
<artifact>nifi-hbase-nar</artifact>
<group>org.apache.nifi</group>
<version>1.6.0-SNAPSHOT</version>
</bundle>
<config>
<bulletinLevel>WARN</bulletinLevel>
<comments></comments>
<concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount>
<descriptors>
<entry>
<key>HBase Client Service</key>
<value>
<identifiesControllerService>org.apache.nifi.hbase.HBaseClientService</identifiesControllerService>
<name>HBase Client Service</name>
</value>
</entry>
<entry>
<key>Table Name</key>
<value>
<name>Table Name</name>
</value>
</entry>
<entry>
<key>Row Identifier</key>
<value>
<name>Row Identifier</name>
</value>
</entry>
<entry>
<key>Row Identifier Encoding Strategy</key>
<value>
<name>Row Identifier Encoding Strategy</name>
</value>
</entry>
<entry>
<key>Column Family</key>
<value>
<name>Column Family</name>
</value>
</entry>
<entry>
<key>Column Qualifier</key>
<value>
<name>Column Qualifier</name>
</value>
</entry>
<entry>
<key>timestamp</key>
<value>
<name>timestamp</name>
</value>
</entry>
<entry>
<key>Batch Size</key>
<value>
<name>Batch Size</name>
</value>
</entry>
</descriptors>
<executionNode>ALL</executionNode>
<lossTolerant>false</lossTolerant>
<penaltyDuration>30 sec</penaltyDuration>
<properties>
<entry>
<key>HBase Client Service</key>
<value>16807846-3484-3619-0000-000000000000</value>
</entry>
<entry>
<key>Table Name</key>
<value>nifi-4866</value>
</entry>
<entry>
<key>Row Identifier</key>
<value>${uuid}</value>
</entry>
<entry>
<key>Row Identifier Encoding Strategy</key>
<value>String</value>
</entry>
<entry>
<key>Column Family</key>
<value>v</value>
</entry>
<entry>
<key>Column Qualifier</key>
<value>t</value>
</entry>
<entry>
<key>timestamp</key>
</entry>
<entry>
<key>Batch Size</key>
<value>25</value>
</entry>
</properties>
<runDurationMillis>0</runDurationMillis>
<schedulingPeriod>0 sec</schedulingPeriod>
<schedulingStrategy>TIMER_DRIVEN</schedulingStrategy>
<yieldDuration>1 sec</yieldDuration>
</config>
<name>PutHBaseCell 1.6</name>
<relationships>
<autoTerminate>false</autoTerminate>
<name>failure</name>
</relationships>
<relationships>
<autoTerminate>false</autoTerminate>
<name>success</name>
</relationships>
<state>RUNNING</state>
<style/>
<type>org.apache.nifi.hbase.PutHBaseCell</type>
</processors>
</snippet>
<timestamp>02/10/2018 11:15:53 JST</timestamp>
</template>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment