On June 28th we made some library upgrades to our service. The upgrade contained a bump of google-cloud-pubsub from 1.115.5 to 1.119.0. The change caused some of the workloads to crash with OutOfMemoryError. Not all workloads were affected, the ones affected the most consume from topics with average message size ranging from 10 KiB to 100 KiB.
Investigation:
When taking a heap histogram (with jmap -histio
) we noticed significant more com.google.pubsub.v1.PubsubMessage
and com.google.protobuf.ByteString$LiteralByteString
object when using newer google-cloud-pubsub library version.
Hypothesis: a) Could these long changes have introduced some issue googleapis/java-pubsub#1022? b) Perhaps a memory leak was introduced?
To reproduce the problem we build a simpler sample code that emulates our application.
We tested it with different versions of the library and using a subscription with real data of average message size of about 70 KiB.
We run PubSubConsumerExample1.java
with -XX:ActiveProcessorCount=2 -Xmx2g
.
Then to get the heap histogram, we used:
jmap -histo $(jps -l -m | grep PubSubConsumerExample1 | head -1 | cut -d' ' -f1) | grep -e 'pubsub\|ByteString\|B \|I \|nio' | head -n 12
We also used JMC to take a Heap dump.
# 1.115.5
num #instances #bytes class name (module)
8: 12243 587664 java.nio.HeapByteBuffer (java.base@11.0.15)
10: 10093 403720 com.google.cloud.pubsub.v1.MessageDispatcher$AckHandler
14: 12218 293232 com.google.cloud.pubsub.it.PubSubConsumerExample1$ContainerObj1
19: 9751 234024 com.google.cloud.pubsub.v1.MessageDispatcher$3
22: 3002 192128 java.nio.DirectByteBuffer (java.base@11.0.15)
25: 9750 156000 com.google.cloud.pubsub.it.PubSubConsumerExample1$$Lambda$155/0x0000000800420840
# 1.119.0
# Note the significant more PubSubMessage and ByteString objects
num #instances #bytes class name (module)
1: 54045 493944336 [B (java.base@11.0.15)
2: 4469 26687888 [I (java.base@11.0.15)
8: 6201 396864 java.nio.DirectByteBuffer (java.base@11.0.15)
14: 3386 162528 java.nio.HeapByteBuffer (java.base@11.0.15)
15: 3362 161376 com.google.pubsub.v1.PubsubMessage
19: 3361 134440 com.google.cloud.pubsub.v1.MessageDispatcher$AckHandler
23: 3361 107552 com.google.cloud.pubsub.v1.MessageDispatcher$3
26: 3851 92424 com.google.protobuf.ByteString$LiteralByteString
34: 3361 80664 com.google.cloud.pubsub.it.PubSubConsumerExample1$ContainerObj1
35: 3361 80664 com.google.cloud.pubsub.v1.AckRequestData
40: 2629 61976 [Ljava.nio.ByteBuffer; (java.base@11.0.15)
46: 3361 53776 com.google.cloud.pubsub.it.PubSubConsumerExample1$$Lambda$175/0x000000080043b840
# custom patched 1.119.0 (avoids anonymous AckReplyConsumer instances)
# Note MessageDispatcher$3 was replaced by AckReplyConsumerImpl with less overhead (was 32 bytes, now 16 bytes)
# Also no leak of PubsubMessage and ByteString$LiteralByteString
num #instances #bytes class name (module)
1: 56048 382810944 [B (java.base@11.0.15)
2: 4016 7620824 [I (java.base@11.0.15)
6: 15395 985280 java.nio.DirectByteBuffer (java.base@11.0.15)
15: 4456 213888 java.nio.HeapByteBuffer (java.base@11.0.15)
18: 4407 176280 com.google.cloud.pubsub.v1.MessageDispatcher$AckHandler
22: 6014 131696 [Ljava.nio.ByteBuffer; (java.base@11.0.15)
30: 4407 105768 com.google.cloud.pubsub.it.PubSubConsumerExample1$ContainerObj1
31: 4407 105768 com.google.cloud.pubsub.v1.AckRequestData
45: 4407 70512 com.google.cloud.pubsub.it.PubSubConsumerExample1$$Lambda$180/0x00000008004b8440
46: 4407 70512 com.google.cloud.pubsub.v1.AckReplyConsumerImpl