Skip to content

Instantly share code, notes, and snippets.

@hyrmn
Last active December 18, 2023 09:16
Show Gist options
  • Save hyrmn/ce124e9b1f50dbf9d241390ebc8f6df3 to your computer and use it in GitHub Desktop.
Save hyrmn/ce124e9b1f50dbf9d241390ebc8f6df3 to your computer and use it in GitHub Desktop.
Extension methods for calling Dapper asynchronously with a Polly retry
public static class DapperExtensions
{
private static readonly IEnumerable<TimeSpan> RetryTimes = new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(3)
};
private static readonly AsyncRetryPolicy RetryPolicy = Policy
.Handle<SqlException>(SqlServerTransientExceptionDetector.ShouldRetryOn)
.Or<TimeoutException>()
.OrInner<Win32Exception>(SqlServerTransientExceptionDetector.ShouldRetryOn)
.WaitAndRetryAsync(RetryTimes,
(exception, timeSpan, retryCount, context) =>
{
LogTo.Warning(
exception,
"WARNING: Error talking to ReportingDb, will retry after {RetryTimeSpan}. Retry attempt {RetryCount}",
timeSpan,
retryCount
);
});
public static async Task<int> ExecuteAsyncWithRetry(this IDbConnection cnn, string sql, object param = null,
IDbTransaction transaction = null, int? commandTimeout = null,
CommandType? commandType = null) =>
await RetryPolicy.ExecuteAsync(async () => await cnn.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));
public static async Task<IEnumerable<T>> QueryAsyncWithRetry<T>(this IDbConnection cnn, string sql, object param = null,
IDbTransaction transaction = null, int? commandTimeout = null,
CommandType? commandType = null) =>
await RetryPolicy.ExecuteAsync(async () => await cnn.QueryAsync<T>(sql, param, transaction, commandTimeout, commandType));
}
public static class SampleCaller
{
public async Task UpsertPerson(string personId, string businessId, string firstName, string lastName, string suffix)
{
await _conn.ExecuteAsyncWithRetry(_upsertSql,
new
{
PersonId = personId,
BusinessId = businessId,
FirstName = firstName.Truncate(50),
LastName = lastName.Truncate(50),
Suffix = suffix.Truncate(10)
});
}
}
// This file comes from:
// https://raw.githubusercontent.com/aspnet/EntityFrameworkCore/master/src/EFCore.SqlServer/Storage/Internal/SqlServerTransientExceptionDetector.cs
// and
// https://github.com/Azure/elastic-db-tools/blob/master/Src/ElasticScale.Client/ElasticScale.Common/TransientFaultHandling/Implementation/SqlDatabaseTransientErrorDetectionStrategy.cs
// With the addition of
// SQL Error 11001 (connection failed)
/// <summary>
/// Detects the exceptions caused by SQL Server transient failures.
/// </summary>
public static class SqlServerTransientExceptionDetector
{
public static bool ShouldRetryOn(SqlException ex)
{
foreach (SqlError err in ex.Errors)
{
switch (err.Number)
{
// SQL Error Code: 49920
// Cannot process request. Too many operations in progress for subscription "%ld".
// The service is busy processing multiple requests for this subscription.
// Requests are currently blocked for resource optimization. Query sys.dm_operation_status for operation status.
// Wait until pending requests are complete or delete one of your pending requests and retry your request later.
case 49920:
// SQL Error Code: 49919
// Cannot process create or update request. Too many create or update operations in progress for subscription "%ld".
// The service is busy processing multiple create or update requests for your subscription or server.
// Requests are currently blocked for resource optimization. Query sys.dm_operation_status for pending operations.
// Wait till pending create or update requests are complete or delete one of your pending requests and
// retry your request later.
case 49919:
// SQL Error Code: 49918
// Cannot process request. Not enough resources to process request.
// The service is currently busy.Please retry the request later.
case 49918:
// SQL Error Code: 41839
// Transaction exceeded the maximum number of commit dependencies.
case 41839:
// SQL Error Code: 41325
// The current transaction failed to commit due to a serializable validation failure.
case 41325:
// SQL Error Code: 41305
// The current transaction failed to commit due to a repeatable read validation failure.
case 41305:
// SQL Error Code: 41302
// The current transaction attempted to update a record that has been updated since the transaction started.
case 41302:
// SQL Error Code: 41301
// Dependency failure: a dependency was taken on another transaction that later failed to commit.
case 41301:
// SQL Error Code: 40613
// Database XXXX on server YYYY is not currently available. Please retry the connection later.
// If the problem persists, contact customer support, and provide them the session tracing ID of ZZZZZ.
case 40613:
// SQL Error Code: 40501
// The service is currently busy. Retry the request after 10 seconds. Code: (reason code to be decoded).
case 40501:
// SQL Error Code: 40197
// The service has encountered an error processing your request. Please try again.
case 40197:
// SQL Error Code: 11001
// A connection attempt failed
case 11001:
// SQL Error Code: 10929
// Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d.
// However, the server is currently too busy to support requests greater than %d for this database.
// For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. Otherwise, please try again.
case 10929:
// SQL Error Code: 10928
// Resource ID: %d. The %s limit for the database is %d and has been reached. For more information,
// see http://go.microsoft.com/fwlink/?LinkId=267637.
case 10928:
// SQL Error Code: 10060
// A network-related or instance-specific error occurred while establishing a connection to SQL Server.
// The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server
// is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed
// because the connected party did not properly respond after a period of time, or established connection failed
// because connected host has failed to respond.)"}
case 10060:
// SQL Error Code: 10054
// A transport-level error has occurred when sending the request to the server.
// (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
case 10054:
// SQL Error Code: 10053
// A transport-level error has occurred when receiving results from the server.
// An established connection was aborted by the software in your host machine.
case 10053:
// SQL Error Code: 1205
// Deadlock
case 1205:
// SQL Error Code: 233
// The client was unable to establish a connection because of an error during connection initialization process before login.
// Possible causes include the following: the client tried to connect to an unsupported version of SQL Server;
// the server was too busy to accept new connections; or there was a resource limitation (insufficient memory or maximum
// allowed connections) on the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by
// the remote host.)
case 233:
// SQL Error Code: 121
// The semaphore timeout period has expired
case 121:
// SQL Error Code: 64
// A connection was successfully established with the server, but then an error occurred during the login process.
// (provider: TCP Provider, error: 0 - The specified network name is no longer available.)
case 64:
// DBNETLIB Error Code: 20
// The instance of SQL Server you attempted to connect to does not support encryption.
case 20:
return true;
// This exception can be thrown even if the operation completed successfully, so it's safer to let the application fail.
// DBNETLIB Error Code: -2
// Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.
//case -2:
}
}
return false;
}
public static bool ShouldRetryOn(Win32Exception ex)
{
switch (ex.NativeErrorCode)
{
// Timeout expired
case 0x102:
// Semaphore timeout expired
case 0x121:
return true;
default:
return false;
}
}
}
@LectorNoticio
Copy link

Hi Ben,
Thanks for sharing your code, very useful as reference for something similar I'm trying to do. I do have a question, in case you have time to check it. Using as an example line 28, on file DapperExtensions, why is it required to use an async lambda in this case?
await RetryPolicy.ExecuteAsync(async () => await cnn.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));
Wouldn't it be sufficient a syntax like this:
await RetryPolicy.ExecuteAsync(() => cnn.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));
given that cnn.ExecuteAsync is already awaitable?

Thanks!

@michaelakin
Copy link

michaelakin commented Jul 27, 2021

Thanks, had to modify this some to catch .ef core DbUpdate Exceptions

      public static readonly AsyncRetryPolicy RetryPolicy = Policy
            .Handle<SqlException>(SqlServerTransientExceptionDetector.ShouldRetryOn)
            .Or<DbUpdateException>(SqlServerDbUpdateExcecptionDetector.ShouldRetryOn)
            .Or<TimeoutException>()
            .WaitAndRetryAsync(RetryTimes,
            
`    public static class SqlServerDbUpdateExcecptionDetector
    {
        public static bool ShouldRetryOn(DbUpdateException ex)
        {
            if (ex.InnerException != null && ex.InnerException.GetType() == typeof(SqlException))
            {
                return SqlServerTransientExceptionDetector.ShouldRetryOn((ex.InnerException as SqlException));
            }

            return false;
        }
    }`    

and I had to enable the -2 Timeout exception above for what I was doing.

@hyrmn
Copy link
Author

hyrmn commented Aug 31, 2021

Hi Ben,
Thanks for sharing your code, very useful as reference for something similar I'm trying to do. I do have a question, in case you have time to check it. Using as an example line 28, on file DapperExtensions, why is it required to use an async lambda in this case?
await RetryPolicy.ExecuteAsync(async () => await cnn.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));
Wouldn't it be sufficient a syntax like this:
await RetryPolicy.ExecuteAsync(() => cnn.ExecuteAsync(sql, param, transaction, commandTimeout, commandType));
given that cnn.ExecuteAsync is already awaitable?

Thanks!

Hi @LectorNoticio,

Sorry for the late reply. I didn't realize there were any comments here until @djhmateer alerted me.

I would preface this and say that all of this code is from late 2019.

At the time... my understanding from @reisenberger's comment on App-vNext/Polly#483 was that it was necessary because, otherwise, your call target (cnn.ExecuteAsync in this case) returns a task to your parent scope and then that gets awaited. Because the Task is returned immediately, the call was successful and Polly is now out of the loop. If your Task then faults then the Polly policy won't be triggered.

But, if you're on a current version of Polly then you definitely should follow the patterns on https://github.com/App-vNext/Polly/blob/master/README.md#asynchronous-support which looks to be wired up like you show.

And a quick xunit project to show it:

using System.Threading;
using System.Threading.Tasks;

using Polly;

using Xunit;
using Xunit.Abstractions;

namespace PollyRetryTests
{
    public class SomeNetworkThing
    {
        public async Task DoWork(int count, CancellationToken ct)
        {
            await Task.Delay(100);
            if (count % 2 == 1)
            {
                throw new TimeoutException("Some sort of network fault happened");
            }
        }
    }

    public class UnitTest1
    {
        private readonly ITestOutputHelper output;

        public UnitTest1(ITestOutputHelper output)
        {
            this.output = output;
        }

        [Fact]
        public async Task Test1()
        {
            var policy = Policy.Handle<TimeoutException>()
                .WaitAndRetryAsync(new[] { TimeSpan.FromSeconds(1) },
                onRetry: (exception, timeSpan, retryCount, context) => output.WriteLine("Retrying..."));

            var sut = new SomeNetworkThing();

            var exception = await Record.ExceptionAsync(() => policy.ExecuteAsync(()=> sut.DoWork(1, default)));
            Assert.Null(exception);
        }
    }
}

@ajbeaven
Copy link

ajbeaven commented Dec 2, 2021

Hi @Hyrmm

This is an article written recently about handling transient connection errors: https://docs.microsoft.com/en-us/azure/azure-sql/database/troubleshoot-common-connectivity-issues

I note the following:

Principles for retry

  • If the error is transient, retry to open a connection.
  • Do not directly retry a SQL Database or SQL Managed Instance SELECT statement that failed with a transient error. Instead, establish a fresh connection, and then retry the SELECT.
  • When a SQL Database or SQL Managed Instance UPDATE statement fails with a transient error, establish a fresh connection before you retry the UPDATE. The retry logic must ensure that either the entire database transaction finished or that the entire transaction is rolled back.

Since the code in this gist doesn't allow for the re-opening of the connection, I would note that this approach goes against MS recommendations. I totally understand the desire to do this though, since it makes the code a lot cleaner than additional retry blocks all throughout your data access code 🤔

They also recommend a wait time of at least 5 seconds before your first retry after the connection fails, to avoid overwhelming any cloud service (if you're using Azure SQL database). This strikes me as a really long time to wait, but there you go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment