jasom/errexit.asciidoc

## errexit.asciidoc

      
    Raw
  

              errexit.asciidoc
            
          
    Thoughts on bourne shell error handling


POSIX errexit is a terrible, no good, very bad thing because of its non-local action.


errexit feels like typical exception handling, but isn’t


Consider the following python from the days-of-yore before the with statement:


def deployWithLogging():
    f = open(SOME_PATH,'w')
    print >> f, 'Starting deployment at', datetime.datetime.now()
    try:
        deploy(logfile=f)
    finally:
        f.close()
        email_log(SOME_PATH)
        os.remove(SOME_PATH)


Now let’s translate this to shell with errexit assumed.  Using filenames rather than open-files is more idiomatic, and there is no try/finally.  Here’s one workaround, but it causes the entirety of deploy to be evaluated with errexit ignored (not disabled; there is no way for deploy to opt-back in to errexit other than tricks like "$0" deploy "$@"


deployWithLogging() {
  printf 'Starting deployment %s\n' "$(date)" > "$SOME_PATH"
  local success
  if deploy; then
    success=0
  else
    success=$?
  fi
  emal_log "$SOME_PATH"
  rm "$SOME_PATH"
  return $success
}


Andy has suggested the following pattern, which behaves basically the same, except for the fact that deploy enables set -e in which case deployWithLogging never runs the cleanup code:


deployWithLogging() {
  printf 'Starting deployment %s\n' "$(date)" > "$SOME_PATH"
  set +e
  deploy
  success=$?
  set -e
  emal_log "$SOME_PATH"
  rm "$SOME_PATH"
  return $success
}


The best solution, IMO, is the following.  Assuming we live in a world where set -e is expected for flow-control, deploy will run as expected, and deployWithLogging will capture and propagate the return code.  I think this is a good default behavior to have in osh, but with a lot fewer things to get wrong.


deployWithLogging() {
  printf 'Starting deployment %s\n' "$(date)" > "$SOME_PATH"
  set +e
  (set -e; deploy)
  success=$?
  set -e
  emal_log "$SOME_PATH"
  rm "$SOME_PATH"
  return $success
}


Obvious to me bad things


Non-local effects on callee by caller


Let’s consider this simple shell function:


deploy() {
  prepareToDeploy
  actualDeploy
}


Assuming we want "abort on any error" to be the default, then we would expect that actualDeploy will never run if prepareToDeploy fails.  This is not true for many cases.


Nonlocal effects of callee on caller


The next issue is that the only action it takes is to exit the entire program; osh has options to pass blocks to builtins so one could do:


deploy() {
  set -e {
    prepareToDeploy
    actualDeploy
  }
}


But now consider:


deployAndCleanup() {
  set +e {
    deploy
    cleanup
  }
}


deployAndCleanup expects to unconditionally run cleanup but instead the whole program is exited in the middle of deploy.  Sure, cleanup could do a prepare || exit 1, but that is an explicit exit that will be noticible.  Exiting the entire program purely do to a choice of a function buried deep is wrong.  Adding insult to injury, if the errcheck in deploy merely caused the function to exit with non-zero status, then by default a function invoking it would propagate, so exiting in the inner function gains nothing in the common case.  Subshells do solve this particular problem, but:


Wrapping every function call in a subshell is gross


Per Andy Chu, I don’t think subshells are that useful a concept… They have a confusing syntax. Most people think () means grouping, when it means subshell


Proposed solution for above


I don’t like errexit and I think it is so fundamentally broken in POSIX that it is unfixable.  However, the rest of the world does exist and widely disagrees.


Andy wants to limit new modes that make changes other than "Exit and complain loudly because this is a bug waiting to happen."  This makes sense.


So I propose a new grouping operator, here shown as {{.  It could use other things, but {{ should be safe because { and } are not tokenized specially in sh (which is, IMO, a major reason why subshells are used by many for grouping).


syntax of {{


'{{' 'WHITESPACE' compound-list ';' '}}'


There is also a function-definition version:


identifier '()' '{{' WHITESPACE compound-list ';' '}}'


Semantics of {{


Inside the lexical scope of a {{ block, errexit is completely ignored.


When any command in compound-list completes with a non-zero exit status, that exit status is returned from the function


TBD Either {{ is disallowed at toplevel or the return from step #2 is treated as exit


TBD command expansion; whatever solution is ultimately good for set -e will be good for this.  Probably running the subshell with set -e enabled, but local foo="$(..)" still needs to be solved independent of this proposal.


The function version is just sugar for:


identifier '()' '{' WHITESPACE '{{' compound-list ';' '}}' WHITESPACE '}'


Examples


quasi-real-world:


set -e

deploy() {{
  step1
  step2 # Assume this fails
  step3
}}


if deploy; then
  echo success
else
  echo fail
fi

deploy

echo "If deploy fails, then this never runs because of set -e"


some non-real-world examples for clarification


set -e
plainFunction() {
  false
  echo "hi"
}

foo() {{
  {
    false; # we don't exit nor return here
    true; # this compound command has a zero exit status
  }
  if false; then :; fi # we don't exit nor return here
  false && true; # we *do* return here
  plainFunction # if we reached here, we would exit
  if plainFunction; then :; fi # if we reached here, we would not exit and would echo hi
}}

foo # since `set -e` is in effect and foo() propagates the error up, we exit here.


Things I’m still considering


Failures during command substitution


Should the following propagate an error if calculateName fails?  I think yes but have not really thought that much about it.


FOO="$(calculateName)"


Boolean expressions


Should the following propagate an error if deploy fails?  I always thought "yes" because the compound expression has a non-zero result, but Andy pointed out that many people expect if A; then B; fi and A && B to work in the same manner, which is a fair point.


deploy && logSuccessfulDeploy


Pipes


I’m leaning towards failures in the middle of pipes as being undetected by default, because of programs that return errors on SIGPIPE.  Perhaps a new pipe operator that interposes between the two pipes, catches SIGPIPE going from right-to-left, passes the signal to the program on the left and then treats the previous command as having succeeded.  This doesn’t fix the issue of "generate lots of output and then exit with a failure" though.


# This will always exit with _pipefail_ on bash
# worser examples may have race conditions for whether or not the pipe is closed
# before the first program exits
yes|head

# I propose a new pipe operator, here ~| because I need some sigil for it

yes ~| head # ~| will detect the SIGPIPE, pass it to yes, not propagate the error from yes

false ~| head # There is no SIGPIPE passed from head to false, so ~| will propagate the error in false up