jehugaleahsa/MEH.md

## MEH.md

      
    Raw
  

              MEH.md
            
          
    Minimal Exception Handling

Introduction

With so many other aspects of daily development completely unattainable by your average developer, exception handling is often ignore in lieu of bigger fish to fry. On the surface, exception handling appears sufficiently complicated enough that most developers have simply resolved to ignore it on a day-to-day basis or altogether. Only during the last weeks of development do they tack on a top-level try/catch block to avoid complete system failure or to log that "some" error occurred. Rather than a simple error dialog appearing to users, they are greeted with the "yellow screen of death". Even when the developer does manage to display an error dialog, the message is often vague or cryptic, e.g., "An error occurred".
We can do better than this. I believe the issue is that not a lot of focus has been given to providing simple, bare-minimum exception handling. For the majority of applications, exception handling only needs to satisfy the following requirements:

Provide insight as to the cause of the error.
Gracefully let users know something went wrong.
Avoid catastrophic failure.

A small set of patterns can make exception handling manageable. Incorporating those patterns into daily development can avoid unnecessary pain at the end of a project or errors in production that cannot be tracked down.
The hope is the sections below demystify exception handling and provide a basis for incorporating exception handling into your daily practice. In some sections, I point out various alternatives and you can choose which makes more sense for your particular situation.
"Catch-all" is okay, when...

Unfortunately, many books and articles talk about how bad it is to catch generic exception types. For example:
try
{
    // something that throws
}
catch (Exception exception)
{
    logger.ErrorException(exception);
}
In reality, catch-alls are often the only option when working with real-world projects. Take a look at Entity Framework 6; it attempts to only expose exceptions that subclass from EntityException. However, certain low-level database-specific exception types can make their way out of Entity Framework. Hence, simply catching EntityException isn't sufficient.
You could have a chain of exception handlers, but ultimately you'll end up handling each exception the same way, at which point you're just duplicating code. For example:
try
{
    // ... some EF query
}
catch (EntityException exception)
{
    throw new DataException("Failed to retrieve something", exception);
}
catch (SqlException exception)
{
    // Uh, I just wrote this...
    // Should I really know I'm working with SQL Server?
    // Maybe I should catch System.Data.Common.DbException, instead.
    // Does it matter? I'll end up handling it the same way.
    throw new DataException("Failed to retrieve something", exception);
}
catch (Exception exception)
{
    // Okay, just being pedantic now. This is silly.
    throw new DataException("Failed to retrieve something", exception);
}
Within a limited scope

As it turns out, the less code we wrap with a try/catch, the more likely we can determine the root cause of an exception. Consider this code:
try
{
    j = i / x;
    ++j;
}
catch (Exception exception)
{
    // ...
}
Assume i, j and x are integers. An experienced developer knows that even this innocent looking code might throw errors, such as DivideByZeroException or OverflowException. It's literally impossible for any other error to escape this block. In that case, it may make sense to explicitly handle those cases. On the other hand, if the way you handle the exception in either block is the same, is there any harm in catching a more generic exception type?
If the only thing you are doing within a try/catch is hitting a database, you can almost guarantee any errors are database-related. So long as your error handling strategy remains unchanged, catching a more generic exception type results in less code and potentially captures unexpected exceptions.
Avoiding leaky layers

Below, I will talk more about where exception classes should live in your code base. I will talk about how exceptions make up part of the interface of your methods, classes and libraries. For now, know it okay to put catch-alls at the top of each layer in your code.
At the very top

Another place where catch-alls make sense is at the top of an application. Even if you follow the patterns in this document perfectly, there's always the chance you aren't handling every possible exception. In fact, certain types of exceptions can never be handled or you shouldn't be trying to handle them (see below). It's a matter of due-diligence to put a catch-all at the top of your application to make sure nothing gets through without being logged.
The issue arises when this is the only error handling employed by developers. Below, I go into the idea of locality and identifying information to explain how to make a top-level catch-all useful.
Recoverable vs unrecoverable

Going back to the Entity Framework example, one thing that would dramatically improve Entity Framework's exception handling would be to distinguish between recoverable and unrecoverable errors.
Consider two possibilities: 1) the database is temporarily unavailable and 2) your LINQ query is invalid. In the case of the database being down, you can safely assume that your code will function again once the database comes back online. In the case of bad LINQ, however, your code may never work.
A 3rd possibility is a LINQ command that may work sometimes, depending on the parameters. SQL Server will often complain if a command is too complex, such as having too many arguments within an IN clause. Personally, I would group this with #2, since the system allowed for it to happen.
In large systems, being able to respond differently to recoverable vs unrecoverable errors can be very important. For example, if a system has a built-in retry mechanism (think message queues...), it could repeatedly retry to execute over a given time period. On the other hand, if it detected an unrecoverable error, it could immediately log the error and stop trying. This is good to avoid unnecessary processing and filling up logs.
In Java, there is a distinction between "checked" vs "unchecked" exceptions, not to be confused with recoverable and unrecoverable exceptions. The language has a debatably annoying feature, requiring developers to manually list out every "checked" exception that can possibly be exposed by a method inside of the throws list. The only way to avoid this list from growing larger the further you go up the stack is to wrap exceptions or opt-out altogether, e.g., something like throws Exception. Most unchecked exceptions (e.g., OutOfMemoryException, NullPointerException, IndexOutOfBoundsException) are unrecoverable, resulting from a lack of system resources or bad code. Similarly, most checked exceptions are recoverable (e.g., SQLException, IOException), but can also be the result of bad code or unexpected inputs.
Providing better insight/feedback

Debatably, the main goal of any exception handling strategy is providing yourself the information you need to diagnose issues in production. That means logging errors, but also providing enough information with those errors to understand the context in which it happened.
How do you associate that contextual information with an error? What information is useful to understanding that context? Is it beneficial to expose contextual information to the user when an error occurs?
Identifying information

Some examples of contextual information include identifying information, such as primary keys or names. Recording the current user may also be useful. Fortunately, most environments include the stack trace as well as a timestamp.
The choice becomes how this information makes its way into logs. In .NET, exceptions have a Data property that allows adding arbitrary key/value pairs to an exception. It would be fairly easy to write a utility to loop through these lookups and include this data in the log. In fact, my colleague Brian Engel wrote a useful NLog extension for doing just that: https://gist.github.com/jehugaleahsa/feea687b55207e3faa3b64170744f3c2
Another simple option is to include contextual information in the error message itself. For example, Could not find the requested user (User ID = 123). If these messages make their way to the user, including this kind of information may or may not be acceptable. However, I've found it makes it easier to diagnose issues when users are in the habit of sending screenshots, and most users really don't notice. I think it is because they are already surprised about the error and probably confused about what they did wrong (probably nothing), so some cryptic identifiers are the least of their concerns. Obviously, care should be taken to avoid exposing any sensitive information.
Locality

Of course, the only way to include contextual information is capture it at the source of the error. As soon as an exception is thrown, it pops up the stack until it finds a matching catch block. Any local variables are lost. For that reason, try/catch statements should have limited scopes where it is clear what the context was when the error occurred.
This is why last-minute approaches to exception handling never really work. A stack trace will tell you the proximity of where an error occurred. What it won't tell you is the context in which it happened. "What account was this? Who was the user at the time? Were they impersonating another role?" I've made it my personal duty to notice whenever I ask myself something like, "I wish I knew what blah blah blah was", and make sure I update my error handling so that information is available the next time. This can take multiple iterations.
Consider:
public Account GetAccount(int accountId)
{
    try
    {
        var query = from account in context.Accounts
                    where account.AccountId == accountId
                    select account;
        return query.SingleOrDefault();
    }
    catch (Exception exception)
    {
        throw new DataException($"Could not retrieve the account (Account ID = {accountId}).", exception);
    }
}
Assume this block of code results in an exception being thrown. What are the chances it is unrelated to a database error? Pretty slim, right? More importantly, when we go to review our logs, we'll see exactly which account caused the error.
Also notice that exception is passed as the second argument to the DataException constructor. Here, DataException is a custom exception class we created for any errors coming out of our data layer. The built-in Exception class has a contructor overload accepting an inner exception (the exception that caused the excepton). This is very important because it ensures our stack trace is complete. Without it, we will know the query failed, but we won't know why.
Custom exception types

Creating your own exception types is easy, on the surface. Unless you need to serialize exceptions, it is usually just a matter of deriving from Exception and calling into the base constructor.
public class DataException: Exception
{
    public DataException(string message, Exception innerException)
        : base(message, innerException)
    {
    }
}
Notice that this class only exposes a single constructor. In my experience, including a message describing the cause of the exception is a good idea. In my previous example, I was declaring DataException to capture any errors coming out of the data layer. Since it is simply wrapping these errors, I am forcing anyone who creates a DataException to pass in the original data layer exception. This makes it clear to anyone using this class that it is a wrapper type.
You might be tempted to include extra properties on your exception types, but my experience is that, often, those properties are not available when you might need them. You might know it is an instance of MyException when you throw it, but you're less likely to know it's type where you ultimate end up catching it. For that reason, I would stick to whatever built-in mechanism your language has for forwarding information, e.g., Data or Message.
As a counter-example, I often represent HTTP status codes with different exception types in my web applications. They have a StatusCode property that gets the underlying status code. In this case, I have built into my application layer knowledge of this status code.
public abstract class RestException : Exception
{
    protected RestException(HttpStatusCode statusCode, string message)
        : base(message)
    {
        this.StatusCode = statusCode;
    }

    public HttpStatusCode StatusCode { get; private set; }
}
In this case, my code is the source of the exception; I'm not wrapping a previously caught exception. I might create a derived class called BadRequestException that resolves to a 401 or a NotFoundException that resolves to a 404. Those types of errors would likely be the result of validation, so my code would be the source of the exception, a.k.a., be at the bottom of the stack trace.
As another counter-example, if you are authoring a reusable library, giving users more details via exception properties again might makes sense (e.g., SqlException.ErrorCode).
Argument handling outside of try/catch

Many developers do not bother checking arguments coming into their methods. They feel that this adds unnecessary overhead at runtime or that any issues will be discovered during development/QA. Argument checking does provide additional benefits, such as early error detection and as documentation about pre-conditions. I find it easier to diagnose an ArgumentNullException than a NullReferenceException, simply because the error message can include the name of the argument that was null.
If you are the type that does argument checking, it may not be entirely obvious to you to keep argument exceptions outside your try/catch block.
Prefer this:
void foo(MyObj obj)
{
    if (obj == null)
    {
        throw new ArgumentNullException(nameof(obj));
    }
    try
    {
        ...
    }
    catch
    {
        ...
    }
}
to this:
void foo(MyObj obj)
{
    try
    {
        if (obj == null)
        {
            throw new ArgumentNullException(nameof(obj));
        }
        ...
    }
    catch
    {
        ...
    }
}
The reason may be obvious. Instances of ArgumentException indicate a programming error. They are pretty much unrecoverable. Your system may be able to hobble along with such a bug in place, but you can't expect your code to handle that type of error gracefully.
This may seem to be in contradiction to my earlier advice about localizing errors, but there's an important distinction. Consider this method:
public IEnumerable<Account> GetRelatedAccounts(Client client)
{
    try
    {
        if (client == null)
        {
            throw new ArgumentNullException(nameof(client));
        }
        // hit database
    }
    catch (Exception exception)
    {
        string message = $"Could not retrieve the client accounts (Client ID = {client.ClientId}).";
        throw new DataException(message, exception);
    }
}
Two things should immediately call out to you: 1) the code that generates the exception message uses client, which would result in a NullReferenceException and 2) the catch can't distiguish between an ArgumentNullException and a database-related error. Even if this code didn't result in a NullReferenceException, the error message would be misleading. Only by inspecting the full stack trace would you see the DataException was wrapping an ArgumentNullException. This ends up breaking locality; you can no longer guarantee the error is caused by the database.
What about this alternative?
public IEnumerable<Account> GetRelatedAccounts(Client client)
{
    try
    {
        if (client == null)
        {
            throw new ArgumentNullException(nameof(client));
        }
        // hit database
    }
    catch (ArgumentException)
    {
        throw;
    }
    catch (Exception exception)
    {
        string message = $"Could not retrieve the client accounts (Client ID = {client.ClientId}).";
        throw new DataException(message, exception);
    }
}
This is wasteful. The ArgumentNullException has to take an unnecessary detour. This code actually equivalent to just moving the argument check outside the try/catch! Worse, what if the data layer tool you're using throws ArgumentNullException? Would you want to wrap it in that case?
Rethrow vs throw

In the previous example, you notice I say throw; instead of throw exception;. In fact, I didn't even create a variable for the exception - there is no "exception". There's a significant difference. By saying throw exception, you'd be throwing away any previous stack trace information. Don't let this catch you by surprise!
The keyword throw followed by an exception simply tells the C# compiler you want to start tracking the stack trace from that point up the stack. Any strack trace information available prior to this line is essentially wiped out. This is intentional: for proprietary or security reasons, some vendors do not want to reveal (even in their logs) where an error originated.
I've personally not dedicated a lot of memory to this quirk. I just write "rethrow" as throw;.
Avoid double wrapping

Business logic is rarely ever paper thin. One object calls another in an intricate web of interactions. That makes it really easy to accidentally capture another exception originating in the same layer of code. Let's say all your business logic uses exceptions inheriting from a BusinessException class. How would you avoid wrapping a BusinessException with another BusinessException? Are there cases when that would make sense?
Answering the second question first, yes there are times when wrapping an exception with another instance of the same type makes sense. It's all about adding contextual information, again. Often, lower down in the stack, your code is dealing with a sub-problem and might not know about the overall process taking place. If code higher up in the stack can provide more insight, wrapping the original exception makes sense. An example might be a workflow that kicks off a restocking request when inventory gets below a certain threshold. If the restocking logic fails, it would be useful to say it failed while responding to a change in inventory caused by a purchase.
Most of the time, though, double wrapping an exception is simply a mistake. The following pattern will help to avoid it:
try
{
}
catch (BusinessException)
{
    throw;
}
catch (Exception exception)
{
    throw new BusinessException("Failed to perform business function XYZ.", exception);
}
Wrapping exceptions at the top of your layers

The code above might cause you some concerns. We are capturing a generic exception and wrapping it with our BusinessException class. In this case, the message conveys the error occurred while trying to perform operation "XYZ". This may be a tool to ease debugging, but depending on how well you handle exceptions, this might also be an anti-pattern. It would handle an ArgumentNullException no differently than a DataException. Good? Bad?
Looking back at our example using DataException, we could almost guarantee the cause of the exception was database-related because the scope of the surrounding try/catch was very small. We kept ArgumentExceptions out of the try/catch to make sure there was no ambiguity between a programming error and a database error.
If we can make these kind of guarantees all the way up our programming stack, wouldn't it be better if we wrote our code like this:
try
{
}
catch (BusinessException)
{
    throw;
}
catch (DataException exception)
{
    throw new BusinessException("Failed to perform business function XYZ.", exception);
}
If something other than a BusinessException or a DataException makes its way to the top of the layer, then we just let it bubble up unhindered.
My experience is that developers are never as vigilent or thorough as they think they are. If you overlook one line of code that might throw an exception, you are right back to a contextless stack trace. On the flip side, there are definite disadvantages to capturing unrecoverable exceptions and treating them the same as any other exception.
If your exception messages show up in front of the user, providing a high-level "An error occurred while saving the account" message is extremely user-friendly regardless of the underlying cause. Otherwise, you may have no other option but to display a generic "An error occurred". Now neither you or the user has any context to go off of!
Given the two options, I tend to feel it is okay to make sure any exception escaping a layer in your code is wrapped with the corresponding exception type. I can live with an ArgumentException or an InvalidCastException being wrapped. I'm still going to have to research the issue and I will still ultimately discover it's a programming error. I might have an easier time diagnosing how the method was called with an invalid argument if I know the context in which it happened.
Exceptions can be a leaky abstraction

I come across a lot of code that will allow lower-layer exceptions to be thrown just about anywhere in the codebase. Consider a repository that returns an IQueryable<T> from an Entity Framework context. An IQueryable delays the execution of the underlying database query until you actually start enumerating over the results (e.g., calling ToList() or looping over the results with a foreach). At that point, your code may thrown an exception. But of what type of exception?
If you designed your application correctly, your business logic should be completely unaware of Entity Framework, since that is a data layer concern. Ideally, your business logic is housed in a separate DLL. So how do you know what type of exception to capture? Even if you try to capture EntityException, you'll see your business logic project doesn't have a reference to Entity Framework, so it doesn't know what type you're talking about!
You're left with no other option than to do a catch-all. Wrapping a foreach loop or a ToList() with a try/catch is a bit awkward; you typically end up wrapping a lot more than just the enumeration itself. Consider:
List<Account> accounts;
try
{
    accounts = query.ToList();  // gross, don't do this!
}
catch (Exception exception)
{
    throw new DataException("Could not retrieve the accounts.", exception);
}
foreach (var account in accounts)
{
    ...
}
Even moving the foreach loop inside the try/catch doesn't make this code any less difficult to follow. In fact, you're increasing the scope, which makes it harder to reason about/guarantee what caused the exception in the first place. You could move the try/catch into a separate method, called getAccounts, but then why not just put that in your data layer to begin with?
Also, remember the whole point of using an IQueryable is to allow your business logic to build up the query piece-by-piece. It can be extremely difficult to include the context (in this case, the filters, sorting, paging, etc.) that resulted in the error in the first place. Having well-defined methods with well-defined parameters makes it much easier to include this information in your exception details.
For that reason, I recommend not exposing IQueryable to your business layer and performing all queries within the context of your repositories/data layer. The same generalization applies to querying other services or using external libraries.
Public vs. private

I find it jarring when exception handling ends up all over a chunk of code. When you start trying to keep the scope of try/catch blocks small, they tend to start cropping up everywhere. They tend to obscure the meaning of the code, so I adopted some practices to help keep the code readable.
Resource management

I find it surprising that so many developers practice no resource management whatsoever in their code. I blame this on an industry made up of developers who treat examples off websites as production-grade software. Regardless, it's an enormously popular topic that comes up a lot in forums. Most languages have started adding direct syntactic support for resource management, such as the using statement in C#. Simply using using covers the majority of resource leaks that can happen. For that reason, I've found the finally block has lost some of its original charm.
I find I do less manual resource management than before, relying on Dependency Injection frameworks to do that on my behalf. I will often design classes so connection-like objects are passed in as dependencies, as opposed to just passing a connection string. This approach applies mostly to typical business applications relying on databases and APIs.
Separation of concerns

I have found that normal code is composed of two concerns: the core functionality and everything else. Argument checking, exception handling and resource management don't ultimately affect what your code is doing. For a long time I debated how to make clear this separation of concerns. Furthermore, I found my code doing unnecessary argument checking. Consider this example:
public class CustomerCollection
{
    private HashSet<Customer> customers = new HashSet<Customer>();

    // ..

    public void Add(Custom customer)
    {
        if (customer == null)
        {
            throw new ArgumentNullException(nameof(customer));
        }
        this.customers.Add(customer);
    }

    public void AddRange(IEnumerable<T> customers)
    {
        if (items == null)
        {
            throw new ArgumentNullException(nameof(customers));
        }
        foreach (Custom customer in customers)
        {
            Add(customer);
        }
    }
}
Both Add and AddRange methods do argument checking. I decided to implement AddRange in terms of Add. You might argue that my implementation would be better off if both methods worked directly against the customers backing fields. However, if the logic for adding a customer were more involved, duplicating that code would be bad.
Unfortunately, the way this code is set up, you end up doing an argument check for every customer added. That may be a good thing because a null customer really doesn't make sense. But, it might also be confusing to users of your CustomerCollection class to receive an ArgumentNullException with the name customer when the AddRange argument is customers, plural.
This might be a better approach:
public class CustomerCollection
{
    private HashSet<Customer> customers = new HashSet<Customer>();

    // ..

    public void Add(Custom customer)
    {
        if (customer == null)
        {
            throw new ArgumentNullException(nameof(customer));
        }
        add(customer);
    }

    public void AddRange(IEnumerable<T> customers)
    {
        if (customers == null)
        {
            throw new ArgumentNullException(nameof(customers));
        }
        if (customers.Contains(null))
        {
            throw new ArgumentException("Encountered null customer.", nameof(customers))
        }
        foreach (Custom customer in customers)
        {
            add(customer);
        }
    }

    private void add(Customer customer)
    {
        customers.Add(customer);
    }
}
Here, all argument checking was kept in the public methods and the shared code was moved to a private method. Over the years, I have slowly adopted the practice of implementing my private methods under the assumption I do not need to check arguments. If the same code needs called twice, I never call a public method from another public method; I move the shared code into private method.
I find having more than a couple lines between a try/catch or using ugly. I'll extract the inner code into a separate method just to avoid deep indentation and obscurity.
Closing thoughts

I found that exception handling in async or promise-base code works similarly to try/catch exception handling. I discovered this pattern when working in Node.js the first time. As promises became more standardized, the pattern really stood out. In some ways, it is a much more elegant way to handle errors, as every method can opt-in to error handling without a need for special syntax. Similar to try/catch semantics, the further you get from the source of the error, the less information you tend to have and the harder it is to handle the error.
Logging has somewhat taken on a standard look in .NET and in other environments, as well. I like that most libraries today make it easy to add additional key/value pairs to the log output. This practice should eventually make including contextual information a non-brainer. I've recently switched to a pseudo JSON format for my logs, which makes working with arbitrary key/value pairs much easier.
I feel as a community we as developers need to start emphasizing that proper error handling is as important as good naming conventions. There are too many development activities that get pushed to the end of projects, exception handling and logging often being the worst. Keeping all these concerns in mind can seem overwhelming when starting a new project. However, I've found that once I put in the effort that "one time", I was able to take what I did in previous projects and bring that forward to new projects more or less unmodified. Solid software is the result of gradual, incremental improvements. You just need to start somewhere.