Skip to content

Instantly share code, notes, and snippets.

@shanthoosh
Created April 8, 2019 17:40
Show Gist options
  • Save shanthoosh/384254f3692e78af09eb87088cc5f656 to your computer and use it in GitHub Desktop.
Save shanthoosh/384254f3692e78af09eb87088cc5f656 to your computer and use it in GitHub Desktop.
Testing the redundant coordinator stream reads fix by samza ApplicationMaster

Before the fix, coordinator stream was read redundantly 5 times by the metadata store as illustrated by the following logs:

xiliu-mn4:logs xiliu$ cat job-activity-aggregation-runner.out |grep "CoordinatorStreamStore" |grep "Registering system stream partition"

2019-03-31 08:55:09.655 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.

2019-03-31 08:55:18.733 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.

2019-03-31 08:55:26.661 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.

2019-03-31 08:55:34.163 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.

2019-03-31 08:55:34.163 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 0.

After the fix, coordinator stream is read once by the metadata-store as illustrated by the following logs.

[svenkata@svenkata-ld2 logs]$  grep 'CoordinatorStreamStore' job-activity-aggregation-runner.out |grep "Registering system stream partition"
2019-04-08 16:06:21.989 [main] CoordinatorStreamStore [INFO] Registering system stream partition: SystemStreamPartition [samzametadatasystem, __samza_coordinator_job-activity-aggregation_i001, 0] with offset: 7646.

Before the fix, coordinator stream metadata was fetched 5 times by the metadata-store as illustrated by the following logs:

xiliu-mn4:logs xiliu$ cat job-activity-aggregation-runner.out |grep "Fetching system stream metadata"
2019-03-31 08:54:28.078 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [JobApplyClickEventEIWithData] from system queuing

2019-03-31 08:54:51.857 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem

2019-03-31 08:55:00.650 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem

2019-03-31 08:55:08.603 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem

2019-03-31 08:55:17.481 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem

2019-03-31 08:55:25.548 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_job-activity-aggregation_i001] from system samzametadatasystem

After the fix, coordinator stream metadata is read once by the metadata-store as illustrated by the following logs.

[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "Fetching system stream metadata"
2019-04-08 16:06:20.734 [main] KafkaSystemAdmin [INFO] Fetching system stream metadata for [__samza_coordinator_samza-count-by-device_i001] from system samzametadatasystem

Before the fix, total startup time of ApplicationMaster was 70 seconds. Here're the relevant logs to illustrate it.

[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "ClusterBasedJobCoordinator" | head -n 1

2019-04-08 15:36:19.338 [main] ClusterBasedJobCoordinator [INFO] Parsing coordinator system config {systems.queuing.producer.bootstrap.servers=localhost:16637, systems.queuing.consumer.deserialization.mode=specific, systems.queuing.samza.msg.serde=, systems.queuing.samza.factory=com.linkedin.samza.system.kafka.SamzaLiKafkaSystemFactory, systems.queuing.kafka.cluster=queuing, systems.queuing.consumer.zookeeper.connect=localhost:12913/kafka-queuing, systems.queuing.samza.key.serde=}


[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "YarnContainerManager" | head -n 1

2019-04-08 15:37:29.694 [main] YarnClusterResourceManager [INFO] Starting YarnContainerManager.

After the fix, total startup time of ApplicationMaster was 20 seconds. This reduces the ApplicationMaster startup time by 350%

[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "ClusterBasedJobCoordinator" | head -n 1
2019-04-08 16:06:19.338 [main] ClusterBasedJobCoordinator [INFO] Parsing coordinator system config {systems.queuing.producer.bootstrap.servers=localhost:16637, systems.queuing.consumer.deserialization.mode=specific, systems.queuing.samza.msg.serde=, systems.queuing.samza.factory=com.linkedin.samza.system.kafka.SamzaLiKafkaSystemFactory, systems.queuing.kafka.cluster=queuing, systems.queuing.consumer.zookeeper.connect=localhost:12913/kafka-queuing, systems.queuing.samza.key.serde=}

[svenkata@svenkata-ld2 logs]$ cat job-activity-aggregation-runner.out |grep "YarnContainerManager" | head -n 1
2019-04-08 16:06:41.694 [main] YarnClusterResourceManager [INFO] Starting YarnContainerManager.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment