- Measuring performance of a streaming application is difficult. GenerateFlowFile can be useful but understanding NiFi backpressure and scheduling is important.
- Push provides better load distribution than Pull.
- Pull can provide the same level of throughput with Push, but latency is bigger. Increasing backpressure threshold is encouraged.
- Fewer larger flow-files provide better throughput than many smaller flow-files.
- HTTP provides identical throughput with RAW Site-to-Site, but use slightly more CPU resources.
- Be careful with Provenance repository max.storage.time, if it's too long for your use-case, CPU will be occupied to rollover the provenance storage and other tasks can't be executed. Once provenance storage starts having too many journal files, it starts backpressure mechanism and holds lock until it clears old events.
- EC2, m3.large
- Ganglia gmetad
- Apache HTTP server
- Zookeeper
-
EC2, m3.large
-
NiFi 1.0.0-SNAPSHOT
-
Java Open JDK 1.8.0_101-b13
-
4GB available. Let's set the soft limitation for NiFi data to 2GB as other data need to be persisted, such as logs and indices.
Data | Limit | Config |
---|---|---|
Flow File Repository | 0.5GB | |
Content Repository | 1GB | Disabled archiving. Ex) 1KB * 1,000,000, or 1MB * 1,000 |
Provenance Repository | 0.5GB |
1MB * 1,000 flow-files are queued: 1007M ./content_repository 540K ./provenance_repository 2.6M ./flowfile_repository
- p.nifi
- push-data-generator: GenerateFlowFile
- relashonship: backpressure threshold objectt: 1,000,000, data size: 1GB
- RPG: to 'input'
- q.nifi
- Input Port: 'input'
- relashonship: backpressure threshold objectt: 1,000,000, data size: 1GB
- push-data-terminator: UpdateAttribute
$ diff nifi.properties nifi.properties.org |grep '<'
< nifi.remote.input.host=0.p.nifi.aws.mine
< nifi.remote.input.socket.port=8081
< nifi.web.http.host=0.p.nifi.aws.mine
< nifi.cluster.is.node=true
< nifi.cluster.node.address=0.p.nifi.aws.mine
< nifi.cluster.node.protocol.port=9091
< nifi.zookeeper.connect.string=0.master.aws.mine:2181
< nifi.zookeeper.root.node=/p.nifi.aws.mine
default
# Build the latest NiFi SNAPSHOT, based on 09840027a37c076f5df6239c669fc77315b761d9 with PR714 (cherry-pick 79521d8cd01c0675bd8bd4d6a9f9382e11ca9d6b)
git checkout master
git cherry-pick 79521d8cd01c0675bd8bd4d6a9f9382e11ca9d6b
nifi-clean-install
./request-spot-fleet master
./request-spot-fleet p.nifi
./request-spot-fleet q.nifi
./generate-hosts
# Add generated hosts
sudo vi /etc/hosts
./update-route53-records
# Update hostname setting for the new node, it also start gmond
./update-hostname 0.p.nifi
./execute-nifish 0.p.nifi restart
"Provenance Maintenance Thread-2" #41 prio=5 os_prio=0 tid=0x00007fc731dd2000 nid=0x1abb runnable [0x00007fc72f2f9000]
java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.getLength(Native Method)
at java.io.File.length(File.java:974)
at org.apache.nifi.provenance.IndexConfiguration.getSize(IndexConfiguration.java:333)
at org.apache.nifi.provenance.IndexConfiguration.getIndexSize(IndexConfiguration.java:347)
at org.apache.nifi.provenance.PersistentProvenanceRepository.getSize(PersistentProvenanceRepository.java:863)
at org.apache.nifi.provenance.PersistentProvenanceRepository.rollover(PersistentProvenanceRepository.java:1371)
at org.apache.nifi.provenance.PersistentProvenanceRepository.access$300(PersistentProvenanceRepository.java:116)
at org.apache.nifi.provenance.PersistentProvenanceRepository$1.run(PersistentProvenanceRepository.java:258)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)