Skip to content

Instantly share code, notes, and snippets.

@saurabh500
Last active June 12, 2024 14:30
Show Gist options
  • Save saurabh500/d28f8a733046b8beacd11454f8f65189 to your computer and use it in GitHub Desktop.
Save saurabh500/d28f8a733046b8beacd11454f8f65189 to your computer and use it in GitHub Desktop.
Some SqlClient ideas

Evolution of SqlClient with DbDataSource

What we want to do ?

  1. Provide a high throughput connection pool in SqlClient.
  2. Connection pool should increase the throughput when the connections are being opened successfully and in a consistent time.
  3. Connection pool should decrease the throughput when the connections start failing or the latency for connection establishment increases.

Improve the performance of Async I/O with SqlClient

Async I/O in SqlClient is slow and in some cases buggy. There are many instances of cancellation of commands for Async execution, not working correctly.

There are perf degradations observed with Async APIs, without any load on the client application, when compared to their Sync counterparts.

What exists today ?

To make changes to these two fundamentals in SqlClient, it is a lot of re-engineering of components. The current connection pool is designed to allow opening a single connection at a time.

This was a good design choice in the past, where the applications were ready to warm up the connection pool, and have it grow iteratively, instead of allow high connection throughput. However in the day and age of the cloud, where applications need to be spun up, to do the work quite fast, and then be shutdown, warming up the connection pool is considered a problem.

Also warming up the connection pool doesn't resonate with Serverless scenarios, whether it involves SQL server being serverless, or the application being serverless.

For the case for async performance being slow, a lot has to do with the following:

  1. Async was written on top of Sync, with some flags to toggle behaviors.
  2. Async had to work with the functionalities or APIs exposed from Native SNI (sni.dll), which cannot be open sourced.
  3. Async in Managed SNI had to stick to the APIs exposed by sni.dll and hence not written in a way, which could leverage the power of async/await in CLR.

what we want to do?

We want to make investments in the improvements of the above two major problem areas in Sql Client.

How do we stage the changes for the driver?

We want to be able to contribute code to Microsoft.Data.SqlClient, but at the same time, be able to toggle between existing client and the improved client.

The improved client may not be a complete rewrite of the existing client, but change async/await to replace the current I/O. This itself is a lot of work, and will change the guts of the driver.

DbDataSource as a solution

SqlClient needs to expose a DbDataSource due to the many benefits it can provide. DbDataSource allows configuring the connections and use them for command execution.

Assume that we provide a SqlDbDataSource which allows for the creation of open connections. At the same time SqlDbDataSource is can be considered a pool of connections.

A DbDataSource can be seen as a factory of open functional connections, which can be used to execute commands.

Following the builder patter, which has been demonstrated to be quite powerful in Npgsql driver, for configuring the DbDataSource, SqlClient needs to expose a SqlDbDataSourceBuilder. When the SqlDbDataSourceBuilder is used, it offers the same features that today's SqlClient offers.

We want to propose the introduction of ExperimentalDbDataSourceBuilder in SqlClient. This will be used to expose connections which are capable of some experimental features.

ExperimentalDbDataSourceBuilder will provide an instance of SqlDbDataSource but it would configure it differently with internal flags. As a result the behavior of connections provided by SqlDbDataSource created from ExperimentalDbDataSourceBuilder will have a different behavior from connections created using new SqlConnection and the connections provided by SqlDbDataSourceBuilder

graph LR
    A[ExperimentalDbDataSourceBuilder] -->|provides| B[SqlDbDataSource']
    B -->|has different behavior| C{SqlConnection'}
    D[SqlDbDataSourceBuilder] -->|provides| E[SqlDbDataSource]
    F[new SqlConnection] -->|creates| E
    E -->|provides|G
    F[new SqlConnection] -->|creates| G[SqlConnection]
    A -->|configures differently| B

Loading

Different behavior ?

  1. SqlConnection' follows a new connection pooling algorithm, which adapts to errors and eases backpressure when needed, but also provides higher throughput when needed.
  2. SqlConnection' creates commands which can then be used to execute readers, which also allows different implementation to be used under the covers. The driver knows about the difference in implementation since the DbDataSourceBuilder can set various parameters to signal which code path to use.

Future

When we are at 100% feature fidelity with the current API surface and behavior of the driver, the capabilities in SqlConnection and other ADO.Net objects in SqlClient, which are enabled due to the DbDataSourceBuilder being used to create the connection, we need to make sure that the flags become the defaults in the SqlDataSourceBuilder.

For the connection pool, we need to decide whether the new implementation becomes the default or opt-in behavior

Streams in SqlClientX

Streams is going to be one of the foundational pieces in SqlClientX

Streams allow layering, and SqlClientX will incorporate the usage of TdsStream in its implementation.

Reads

TdsStream will allow reading packet data from Tds payload. The Tds payload consists of logical Tds messages, which are grouped as Tds packets. The size of packet is negotiated during login handshake.

While reading, the caller of the TdsStream would need to make sure that it is calling the right number of reads from the stream.

The stream read information would make sure that it is aware of the Packet Header, and can handle partial read of packet data. The stream would call the underlying I/O for more data, if there isn't enough in the packet buffer, or if the packet buffer is partially read.

The Reads in the stream would hide the complexity of any Network I/O from the consumer. The stream's read capability will make sure that the information that is being requested is readily available.

Writes

While writing packets in the stream, the TDS implementation would notify the stream about the packet type being written.

The stream would chunk the data into multiple packets, and would allow for the flushing of packet, which would mark the packet as the last packet of the stream.

Some examples of read complexities

TdsStream.Read( 10 bytes)

Initial state : No data

Header Packet Data

Network Read: return 5 bytes

State: we have 5 bytes of data

Header Packet Data
5 bytes

Not enough data in header, to take any decisions.

Read from network again (packetsize - 5): returns 20 bytes The header is complete and we extract the data length from the header.

Header Packet Data Length found in header
8 bytes 17 bytes 200

The caller had asked for 10 bytes: we have enough data. We return 10 bytes are left with 7 bytes. However the header promises that the packet would have 200 - 8 - 10 = 182 bytes of data.

Next call to read asks for 100 bytes of data:

Keep reading data till we get 100 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment