Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save belliottsmith/0d12df678d8e9ab06776e29116d56b91 to your computer and use it in GitHub Desktop.
Save belliottsmith/0d12df678d8e9ab06776e29116d56b91 to your computer and use it in GitHub Desktop.
CASSANDRA-14503 Review
* Significant Bugs
* OutboundMessagingConnection
* No limit on number of threads processing single threaded queue (undefined behaviour - possibly lost messages, duplicate messages, corrupt state and kill connection indefinitely)
* OMC.close() invokes a blocking action on the eventLoop. On the large message connection this could be a temporary deadlock, as the message processing thread cannot make progress until the eventLoop flushes, but the eventLoop is blocked on the large message connection thread terminating and running its cleanup. This is only avoided because there is a 5s limit to how long the eventLoop will pause for, but for these five seconds all work would stop for this event loop.
* If this deadlock occurs, cleanup is never run and the backlog remains forever - with a permanent GC root preventing it from being collected
* Large message delivery with a backlog might never shed messages, as current time only refreshed once queue is exhausted. Since expiry can only happen on this thread, could grow unboundedly if the outbound network falls behind.
* (A) BBDOSP.close() could be invoked from eventLoop while in use by message delivery thread
* This could result in a use-after-free bug; and if the memory is re-allocated to another message in time, corruption
* purgeBacklog can be called by eventLoop while large message delivery is in progress; this structure requires single threaded access by consumer, so undefined behaviour to invoke this on eventLoop at same time
* ByteBufferDataOutputStreamPlus
* No mechanism for writers to synchronise with flush progress, i.e. streaming would be able to “complete” successfully before all flushes have happened
* (Bug?; A) Exceptions during flush not propagated to the writer**
* RebufferingByteBufDataInputPlus
* consumeUntil() can corrupt the buffer if writer.applyToChannel() throws
* Timeout may corrupt the stream if it occurs mid-message; the processor assumes the stream is safe to continue with, when it is not, and it should be terminated immediately
* InboundMessageHandler
* Uses normal-mode handlers for LARGE connections from 3.0 and 3.11, meaning potentially unbounded buffer growth and network stalls during deserialization
* MixedMode
* MessageOut.serializedSize would cache the wrong size if the protocol version was negotiated and changed after it was enqueued. This would lead to a corrupted stream.
* Error handling
* Swallowing ClosedChannelException in BBDOSP.close, so the above bugs of racing to close a channel that was in use for streaming or large messages would go entirely unreported
* Swallowing IOException in handleFailure, simply closing the connection without any user-visible notification anything was wrong in, for instance, the mixed-mode bug with an incorrect serializedSize.
* Reporting non-IOException but taking no action - probably leaving connection in unusable state, and potentially indefinitely ceasing communication between the nodes.
* Streaming
* Partitions that crossed compressed chunk boundaries would corrupt the stream reader state
* Undiagnosed Impact
* OutboundMessagingConnection
* queue.isEmpty() invoked on non-consumer thread (in multiple places)
* Does not guarantee all actions declared to need to run on eventLoop actually are
* connect() invoked outside of eventLoop
* callbacks from close() and bootstrap() may be invoked on GlobalEventExecutor
* Modest Impact
* Load Shedding
* Can only happen on eventLoop, which is redundant since it sheds messages anyway when performing delivery, but also offers no guarantee of preventing overload. In an overloaded system, cannot depend on eventLoop being run (or promptly getting to us)
* Shedding can experience head of line blocking
* A single call with default 10s timeout (i.e. any response) can permit 10s of sheddable messages to accumulate
* Messages such as Paxos that are undroppable but sent on a regular connection may block shedding forever (though droppable messages may still be dropped on enqueue, until the backlog of undeliverable messages clears no message may be enqueued besides other undroppable messages)
* Lost Messages
* OutboundMessagingConnection
* serializedSize that selects message queue could assign small when requires large; in this case the buffer for small message delivery will never be large enough to serialise the message, and it will be dropped
* Could purge messages from a perfectly healthy queue under no load via race condition and incorrect usage of WRITER_IDLE notification from Netty
* reconnectWithNewIp, handleWithFailure and close may all close a channel that is mid-delivery of one or more large messages, that may now never be successfully delivered
* Transient failure to authenticate will lose all queued messages to the endpoint on all connections
* Failure to serialise a single small message will invalidate up to 64 messages or 64KiB (whichever is smaller) that are being serialised alongside, and will not report the failure in any way
* InboundMessageHandler
* Doesn’t handle unknown table exception, leading to loss of all packets on the wire and in-memory after this message is received
* Resource Leak / Consumption
* Does not shutdown large message executor on closing a connection (thread leak). Connections are cycled periodically, so number of threads would steadily grow in systems sending large messages.
* BBDOSP: Leak of buffer if exception thrown in close()
* Streaming default high water mark could cause a lot of memory usage for non-zero-copy. With a limit of 4MiB, and a maximum in-flight of _availableProcessors_ *per session*, this could plausibly reach 1GiB
* InboundHandshakeHandler catching Exception instead of Throwable; parsing of endpoint can throw AssertionError, and other places could throw e.g. OOM. Depending on location of failure, the connection is leaked.
* RBBBDIP
* Last ByteBuf may leak if the InboundMessageHandler close and background tasks timing of close of the message handler and background task
* append() and close() may race, and leak a ByteBuf being provided by append() that occurs logically after close() but with overlapping execution
* Compatibility
* Not forward compatible with unknown parameters
* Probably Minimal Impact
* OutboundMessagingConnection
* softClose could race with actions on eventLoop to set connectionId; should run on eventLoop, and wait for success
* Unsafe publication of changes to Channel and targetVersion, which are both used by large message delivery without necessarily any explicit synchronisation occurring
* Netty High/Low watermark not honoured; particularly for streaming, which specifies several MiB but treats low water mark as full minus 32KiB (hence the hack to disable Netty spamming errors that we were abusing the API). Consequence: waste of memory for streaming + unnecessary thread wake-up coordination costs.
* Low watermark also not honoured on eventLoop delivery
* close()
* with flush
* Does not honour its contract and drain outstanding queue messages; simply ensured _in progress_ flushes completed before closing (which should happen in any scenario)
* Does not wait for queued messages to drain
* Does not immediately prevent new messages from being sent
* without flush does not wait for close to complete
* If multiple callers invoke close(), later callers may believe the connection is closed when it is not
* Does not ensure in-progress connection attempts are aborted before completing the close()
* Metrics are being registered under the ‘preferred’ IP address, which is not the correct identifier for the node
* When the ‘preferred’ IP was updated, the metric IP was not updated, so this could also remain stale indefinitely
* Streaming also using ‘preferred’ IP for metrics
* Not properly honouring high water mark, as not counting message headers when imposing the limit (may lead to occasional warning messages)
* Not releasing metrics when we close the connections
* InboundHandshakeHandler
* Timeout not started immediately; if no bytes arrive, connection never times out
* Not logging connection attempts that fail to authenticate
* Not logging failure to flush second handshake message
* Adding streaming pipeline as a separate eventExecutor instead of as the eventLoop for the channel, incurring probably non-trivial synchronisation overheads, and having the messaging loop still perform most of the work, but not the blocking work. Will have also meant cross-thread recycling etc.
* RebufferingDataInputPlus
* Large messages hold onto last ByteBuf of a message until a new message arrives, leaking up to 64KiB per quiet connection
* JVMStabilityInspector not invoked on exceptions in large delivery thread, potentially leaving JVM in unstable state after OOM
* ByteBufDataOutputStreamPlus
* newDefaultChannel.write did not write ByteBuf; presumably unused
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment