Before the fix, coordinator stream was read redundantly 5 times by the metadata store as illustrated by the following logs:
xiliu-mn4:logs xiliu$ cat job-activity-aggregation-runner.out |grep "CoordinatorStreamStore" |grep "Registering system stream partition"
2019-03-31 08:55:09.655 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.
2019-03-31 08:55:18.733 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.
2019-03-31 08:55:26.661 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.
2019-03-31 08:55:34.163 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.
2019-03-31 08:55:34.163 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.
After the fix, coordinator stream is read once by the metadata-store as illustrated by the following logs.
[svenkata@svenkata-ld2 logs]$ grep 'CoordinatorStreamStore' job-activity-aggregation-runner.out |grep "Registering system stream partition"
2019-04-08 16:06:21.989 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 7646.
Before the fix, coordinator stream metadata was fetched 5 times by the metadata-store as illustrated by the following logs:
xiliu-mn4:logs xiliu$ cat job-activity-aggregation-runner.out |grep "Fetching system stream metadata"
2019-03-31 08:54:28.078 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [JobApplyClickEventEIWithData] from system queuing
2019-03-31 08:54:51.857 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem
2019-03-31 08:55:00.650 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem
2019-03-31 08:55:08.603 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem
2019-03-31 08:55:17.481 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem
2019-03-31 08:55:25.548 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem
After the fix, coordinator stream metadata is read once by the metadata-store as illustrated by the following logs.
[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "Fetching system stream metadata"
2019-04-08 16:06:20.734 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_samza-count-by-device_i001] from system samzametadatasystem
Before the fix, total startup time of ApplicationMaster was 70 seconds. Here're the relevant logs to illustrate it.
[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "ClusterBasedJobCoordinator" | head -n 1
2019-04-08 15:36:19.338 [main] ClusterBasedJobCoordinator [INFO] Parsing coordinator system config {systems.queuing.producer.bootstrap.servers=localhost:16637, systems.queuing.consumer.deserialization.mode=specific, systems.queuing.samza.msg.serde=, systems.queuing.samza.factory=com.linkedin.samza.system.kafka.SamzaLiKafkaSystemFactory, systems.queuing.kafka.cluster=queuing, systems.queuing.consumer.zookeeper.connect=localhost:12913/kafka-queuing, systems.queuing.samza.key.serde=}
[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "YarnContainerManager" | head -n 1
2019-04-08 15:37:29.694 [main] YarnClusterResourceManager [INFO] Starting YarnContainerManager.
After the fix, total startup time of ApplicationMaster was 20 seconds. This reduces the ApplicationMaster startup time by 350%
[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "ClusterBasedJobCoordinator" | head -n 1
2019-04-08 16:06:19.338 [main] ClusterBasedJobCoordinator [INFO] Parsing coordinator system config {systems.queuing.producer.bootstrap.servers=localhost:16637, systems.queuing.consumer.deserialization.mode=specific, systems.queuing.samza.msg.serde=, systems.queuing.samza.factory=com.linkedin.samza.system.kafka.SamzaLiKafkaSystemFactory, systems.queuing.kafka.cluster=queuing, systems.queuing.consumer.zookeeper.connect=localhost:12913/kafka-queuing, systems.queuing.samza.key.serde=}
[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "YarnContainerManager" | head -n 1
2019-04-08 16:06:41.694 [main] YarnClusterResourceManager [INFO] Starting YarnContainerManager.