Skip to content

Instantly share code, notes, and snippets.

@ijokarumawak
Last active February 15, 2022 11:20
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ijokarumawak/a0f7023225362e636f31d1376055e67c to your computer and use it in GitHub Desktop.
Save ijokarumawak/a0f7023225362e636f31d1376055e67c to your computer and use it in GitHub Desktop.
NiFi 1.0.0 Site-to-Site performance test

Key findings

  • Measuring performance of a streaming application is difficult. GenerateFlowFile can be useful but understanding NiFi backpressure and scheduling is important.
  • Push provides better load distribution than Pull.
  • Pull can provide the same level of throughput with Push, but latency is bigger. Increasing backpressure threshold is encouraged.
  • Fewer larger flow-files provide better throughput than many smaller flow-files.
  • HTTP provides identical throughput with RAW Site-to-Site, but use slightly more CPU resources.
  • Be careful with Provenance repository max.storage.time, if it's too long for your use-case, CPU will be occupied to rollover the provenance storage and other tasks can't be executed. Once provenance storage starts having too many journal files, it starts backpressure mechanism and holds lock until it clears old events.

Environment

Master

  • EC2, m3.large
  • Ganglia gmetad
  • Apache HTTP server
  • Zookeeper

Nodes

  • EC2, m3.large

  • NiFi 1.0.0-SNAPSHOT

  • Java Open JDK 1.8.0_101-b13

  • 4GB available. Let's set the soft limitation for NiFi data to 2GB as other data need to be persisted, such as logs and indices.

Data Limit Config
Flow File Repository 0.5GB
Content Repository 1GB Disabled archiving. Ex) 1KB * 1,000,000, or 1MB * 1,000
Provenance Repository 0.5GB

1MB * 1,000 flow-files are queued: 1007M ./content_repository 540K ./provenance_repository 2.6M ./flowfile_repository

  • p.nifi
    • push-data-generator: GenerateFlowFile
    • relashonship: backpressure threshold objectt: 1,000,000, data size: 1GB
    • RPG: to 'input'
  • q.nifi
    • Input Port: 'input'
    • relashonship: backpressure threshold objectt: 1,000,000, data size: 1GB
    • push-data-terminator: UpdateAttribute

nifi.properties

$ diff nifi.properties nifi.properties.org |grep '<'
< nifi.remote.input.host=0.p.nifi.aws.mine
< nifi.remote.input.socket.port=8081
< nifi.web.http.host=0.p.nifi.aws.mine
< nifi.cluster.is.node=true
< nifi.cluster.node.address=0.p.nifi.aws.mine
< nifi.cluster.node.protocol.port=9091
< nifi.zookeeper.connect.string=0.master.aws.mine:2181
< nifi.zookeeper.root.node=/p.nifi.aws.mine

logback.xml

default

Commands

Build

# Build the latest NiFi SNAPSHOT, based on 09840027a37c076f5df6239c669fc77315b761d9 with PR714 (cherry-pick 79521d8cd01c0675bd8bd4d6a9f9382e11ca9d6b)
git checkout master
git cherry-pick 79521d8cd01c0675bd8bd4d6a9f9382e11ca9d6b
nifi-clean-install

How to Start a NiFi node

./request-spot-fleet master
./request-spot-fleet p.nifi
./request-spot-fleet q.nifi
./generate-hosts
# Add generated hosts
sudo vi /etc/hosts
./update-route53-records
# Update hostname setting for the new node, it also start gmond
./update-hostname 0.p.nifi
./execute-nifish 0.p.nifi restart

Provenance Repository rollover stacktrace

"Provenance Maintenance Thread-2" #41 prio=5 os_prio=0 tid=0x00007fc731dd2000 nid=0x1abb runnable [0x00007fc72f2f9000]
   java.lang.Thread.State: RUNNABLE
     at java.io.UnixFileSystem.getLength(Native Method)
     at java.io.File.length(File.java:974)
     at org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:333)
     at org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:347)
     at org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:863)
     at org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1371)
     at org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:116)
     at org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:258)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
     at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

How to setup?

  • p.nifi

    • keystore
    • truststore
      • q.nifi's cert
      • admin user's cert
  • q.nifi

    • keystore
    • truststore
      • p.nifi's cert
      • admin user's cert
# create private key and certificate
echo -n 'hostname:'; \
read k; \
openssl req -x509 -newkey rsa:2048 \
 -keyout $k.pem \
 -out $k.crt \
 -days 365 \
 -nodes \
 -subj "/CN=$k/C=US/L=$k"
# convert those into p12 key store file, in order to import those into java keytool keystore
echo -n 'hostname:'; \
read k; \
openssl pkcs12 -export \
 -in $k.crt \
 -inkey $k.pem \
 -out $k.p12 \
 -passout pass:"pfxpassword" \
 -name $k
# import p12 into java keystore
echo -n 'hostname:'; \
read k; \
keytool -importkeystore \
 -deststorepass keystorepass \
 -destkeypass keystorepass \
 -destkeystore $k-keystore.jks \
 -srckeystore $k.p12 \
 -srcstoretype PKCS12 \
 -srcstorepass pfxpassword -alias $k
# add nodes into truststore
i=0.p.nifi.aws.mine; \
k=0.q.nifi.aws.mine; \
keytool -importcert \
 -v -trustcacerts \
 -alias $i \
 -file $i.crt \
 -keystore $k-truststore.jks \
 -storepass truststorepass \
 -noprompt
# list keys
keytool -list -storepass truststorepass -keystore 0.q.nifi.aws.mine-truststore.jks

How to setup: Apache Web Server

The most part of this setup is originally written in this StackOverflow question. Added few commands to setup a forward proxy with authentication from scratch.

Install Apache Web Server:

sudo yum install httpd24

Create a file /etc/httpd/conf.d/proxy.conf:

ProxyRequests On
ProxyVia On

# Only 443 and 563 are supported by default, custom ports need to be added.
AllowCONNECT 443 563 8443

<Proxy "*">
  Order deny,allow
  Allow from all
  # AuthType Basic
  AuthType Digest
  # Specify auth realm
  AuthName "aws.mine"
  # AuthUserFile basic_password.file
  AuthUserFile digest_password.file
  AuthGroupFile group.file
  Require group usergroup
</Proxy>

Create group and password file:

# Create a group file:
vi /etc/httpd/group.file (Add following entry)
usergroup: nifi

# Create password files for basic and digest auth:
htpasswd -c /etc/httpd/basic_password.file nifi (I used 'nifi proxy password' as a password here)
htdigest -c /etc/httpd/digest_password.file aws.mine nifi (I used 'nifi proxy password' as a password here)

Restart Apache Web Server:

service httpd restart

How to setup: Apache Traffic Server

I used following commands to install Apache Traffic Server, based on the Administration Guide:

curl -OL http://apache.claz.org/trafficserver/trafficserver-6.1.1.tar.bz2
tar xvf trafficserver-6.1.1.tar.bz2
cd trafficserver-6.1.1
./configure --help

sudo yum install gcc-c++
sudo yum install openssl-devel
sudo yum install tcl-devel
sudo yum install libxml2-devel
sudo yum install pcre-devel
sudo ./configure --prefix=/usr/local/ats --with-user=tserver

sudo useradd -M --shell /bin/false tserver
sudo usermod -L tserver

DNS Server

In order to keep using the same environment, while it's possible to shutdown EC2 instances when it's not used, I need to use host names instead of private ip addresses to let nodes talk to each other. I originally tried to use hosts file to maintain hostnames, but it seems ATS doesn't use hosts file, instead it supports Host DB. However, I ended up using Route53 private hosted zone because it's easier to manage hostname and ip addresses among nodes in the environment.

I wrote a script to update Route53 records.

Authentication

While it's possible to configure ACL with ip address out of the box [1], user authentication has to be added as a plug-in. There was a discussion in TS ML [2] that mentioned TS doesn't support digest auth, and it should happen at HTTP server. There was no complete auth plugin (only sample) at that time, and I think it's still the same. There was an experimental plugin which redirects auth request to the origin or an auth server [3]. [1] https://docs.trafficserver.apache.org/en/latest/admin-guide/security/index.en.html#controlling-access [2] https://mail-archives.apache.org/mod_mbox/trafficserver-users/201305.mbox/%3CCAB1tU+cvLY_bJ6wz8YjfQF26NvV=uxse1QsXMv7jNMeS_Txi4A@mail.gmail.com%3E [3] https://docs.trafficserver.apache.org/en/5.3.x/reference/plugins/authproxy.en.html

From above reason, I haven't tested user auth with ATS.

Be careful with Default ip_allow setting

The default ip_allow.config is configured something like below:

src_ip=127.0.0.1                                  action=ip_allow method=ALL
src_ip=::1                                        action=ip_allow method=ALL
# Deny PURGE, DELETE, and PUSH for all (this implies allow other methods for all)
src_ip=0.0.0.0-255.255.255.255                    action=ip_deny  method=PUSH|PURGE|DELETE
src_ip=::-ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff action=ip_deny  method=PUSH|PURGE|DELETE

The above setting allow every ip to GET and POST, but restricts only localhost can perform PUSH, PURGE and DELETE. It makes HTTP Site-to-Site fails in the middle of data transport, since HTTP Site-to-Site uses DELETE method to finalize a transaction.

So, I've changed the setting as follows. Access to this proxy is protected by AWS Security Groups instead:

src_ip=::-ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff action=ip_allow  method=ALL
src_ip=0.0.0.0-255.255.255.255                    action=ip_allow  method=ALL

records.config

##############################################################################
# Specify server addresses and ports to bind for HTTP and HTTPS. Docs:
#    https://docs.trafficserver.apache.org/records.config#proxy-config-http-server-ports
##############################################################################
CONFIG proxy.config.http.server_ports STRING 8080
# Added this to use 8443
CONFIG proxy.config.http.connect_ports STRING 443 563 8443
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment