ghostsquad/error_handing_best_practices.md

## error_handing_best_practices.md

      
    Raw
  

              error_handing_best_practices.md
            
          
    Exception/Error Handling and Logging

Log Levels

First, let's define various error levels (log levels).
Fatal/Critical: The application composition root should look something like this:
def main(*args)
  try
    # compose and start
  rescue StandardError => err
    log.fatal(err)
    exit(1)
  end
end
Fatal should only ever be used to indicate an uncaught exception that crashes the application.
Error: This is a problem that should be investigated immediately. It could indicate:

A situation which is fatal to the operation, but not the service or application (can't open a required file, missing data, etc.). These errors will force user (administrator, or direct user) intervention. These should be reserved for incorrect connection strings, missing services, etc.
A scenario that is correctable, but is not validated at the proper level/depth, and therefore may not be immediately clear what the resolution is.
As an example: Data too long for column 'foo' at row 1: INSERT INTO 'bar'

Warn: This could be a problem, might not. Use this in sitations such as:

Transient environment conditions, such network/db connectivity. These issues should be escalated to error if they do not recover.

Info: Generally useful information to log (service start/stop, configuration assumptions, etc). Info I want to always have available but usually don't care about under normal circumstances. Allow insight into coarse-grain progress of an operation.
Trace: Fine-grain progress of an operation, including context. Typically this would not be enabled in production.
Debug: Debug < Trace. Debug messages are not encouraged, instead use useful trace levels. These should never be enabled in production.
When to catch an exception

Top Level Operations (after application composition)

The top level of an operation should contain a try/catch to log when an operation is fatally exits, and can not recover.
In most circumstances, it is preferable to not crash the application if the operation fails.
We'll want to wrap individual operations with specific rescues in order to provide the proper exit code to the user.
Example:
def main(*args)
  try
    # note on exit codes in this example:
    # 0: success
    # 1: failure due to external circumstances (something that can be fixed by the operator):
    #    * configuration
    #    * network connection
    #    * external dependent service problem
    # 2: failure due to bug in code

    # if the config file is missing, or malformed, we cannot continue
    config, err = try_initialize_config
    if err
      log.error(err)
      exit(1)
    end

    # the above code could also be written like this
    # if initialize_config could throw an error instead of return success/failure
    try
      config = initialize_config
    rescue InitializationFailedError => err
      log.error(err)
      exit(1)
    end

    # this is not included in the log.error try catch blocks because 
    # if something happens during initialization of the class,
    # it indicates a code bug, not something the operator can fix
    thing_doer = ThingDoer.new(config.stuff)

    try
      thing_doer.do_thing()
    rescue ThingDoerError => err
      log.error(err)
      exit(1)
    end

    exit(0)
  rescue StandardError => err
    log.fatal(err)

    # set an exit code that means something disastrous happened
    # this is different than the operation failed.
    # this should almost _always_ mean there's a bug in the code.
    exit(2)
  end
end
Within an operation

You should only ever catch an exception inside an operation if you plan on doing something about it.
If you plan on reraising, don't log the error, it will be logged where it's appropriate upstream.
def publish_to_api(data={})
  tries ||= 3
  log.info "attempting to publish to api"
  DataLibrary.publish(data)
rescue DataLibraryFailureException => e
  unless (tries -= 1).zero?
    log.warn "publish to api failed, will attempt retry"
    retry
  else
    # we'll reraise here because we don't know what else we can do
    # unless upstream has another method for handling this situation,
    # we'll end up logging this error at the _operation_ level, 
    # so we don't need to explicitly log here
    raise
  end
else
  log.info "success!"
end

References


https://stackoverflow.com/questions/2031163/when-to-use-the-different-log-levels
https://dave.cheney.net/2012/01/18/why-go-gets-exceptions-right