Lately, a colleague and I were doing manual integration tests for some new features. Instead of the usual trial and error, this time I decided to be diligent and create a written test plan beforehand (and also to make the collaboration easier).

What I realized afterwards was that the majority of test cases we came up with were for validating correct error handling. Our new features seemed to be 75% errors and 25% happy path!

That got me thinking about managing errors in software in general.

Errors in the Shadow of Functionality?

Isn't it interesting how we always talk about the requirements in terms of how an application should fulfill its task but far less often about how it should handle failure? Maybe it's just me, but when I think about designing an HTTP API, I usually start with the 200/201 responses, not the 400+ ones.

But it's usually the unhandled error cases that cause us some of the most trouble down the road.

While this can be overlooked in new projects, the importance of handling failure has a tendency to surface over time. Let's look at HTTP status codes again. According to Mozilla, there are 10 successful status codes but 40 error codes (including the famous 418 I'm a teapot - of course). Yes, there are 11 more (non-deprecated) status codes in the 1xx and 3xx range, which, one could argue, are also non-errors. But even counting those as successes, there would still be almost double the number of errors.

Another example: exit codes in POSIX shells. Here, the relation is even more extreme, with 0 indicating a successful program execution and 1-255 being reserved for failed ones. That's a 1:255 ratio, imagine the manpage for a program that differentiates between 255 error cases!

And to bring another blast from the past to the table: Java's checked exceptions. Sure, I could create a thousand subclasses of Exception and call it a day, but that's not what I mean to say. Rather, look at how the language bends itself in order to represent potential failures of methods in its signature:

                
public interface DataBase {
    int getItemCount(String partitionKey) throws IOException;
}
                
            

So many other languages have two places where types can be placed in a function signature:

  1. Its inputs
  2. Its outputs

But getItemCount would be lying if it were not defined as well by the possibility of it failing to communicate with the database. Yes, not all is green with checked exceptions (although you'd be surprised about how fond I am of them), but if used correctly, they can be tremendously helpful. Of course, with the help of IDEs wrapping them in an unchecked RuntimeException per default during autocompletion, their upsides are quickly vaporized.

As you can see, errors or rather their priorization often play an important role in successful, long-living projects.

Errors Are Hard, Accept It

After formulating my thoughts on the topic of error handling, I realized that a surprisingly consistent pattern in my preference for tools is their proficiency in allowing me to handle failure.

For example, I feel quite uncomfortable using Python without multiple layers of static analysis like mypy or flake8. But even with these tools in place, there is still uncertainty about where errors may occur and how to handle them.

Another one I feel quite strongly about are magic frameworks which _"handle"_ the errors for me, no matter where I throw them (I am looking at you, Spring). Yeah, sure, for simple CRUD APIs that might work out -- if an API is only supposed to authenticate a request and write a value into a database, it's mostly irrelevant at which point in the request lifecycle the error occurs. But as soon as there is I/O with multiple other potentially interdependent, stateful services, I need full control during error handling. No way some shiny global error handler can correctly ascertain the correct fallback, rollback, and/or response behavior. But it seems like large parts of the "enterprise" ecosystem in Java are built on just that assumption, which makes the experience for someone like me plain frustrating.

Conversely, I very much do like tools that tell me, "Yes there will be errors; here they are; handle them!". There's a raw honesty about that, which I absolutely appreciate. Initially, I found the cascade of if err != nils in my first Go programs to be ugly. But soon I felt assured by seeing that I actually handled every possible error manually.

The same thing applies to monadic error handling, which is mainly found in functional languages. Representing failure as part of a type like Either[L,R] or Result[T] is another way of adding this "third" group of error types to a function signature. Frequently, functional languages even provide syntactical sugar around these error-carrying types when piping them together.

Examples: monadic error handling in Scala, Rust or Haskell

How Errors Are Handled

At its core, error handling is not too hard. You basically need to choose between four solutions:

  1. Push the error up the callstack
  2. Handle the error in the current context
  3. Convert the error into a valid domain value (e.g., an HTTP 500 response)
  4. Crash the application

The actual complexity is always hidden in the second solution. Here live mechanisms like fallbacks, retries, rollbacks, etc. Everything else is just finding out which case to apply:

Is there not enough information in this context? → Push the error up

Is there no layer above the current context? → Convert the error into a domain value or set fire to the mainboard

You got enough data but don't want to put the application to death (yet)? → Well, that's too bad, you gotta handle it

Fail Safe!

So that's my two cents on errors, how I like to work with them, and what puts me off about tooling that attempts to hide them from me.

I hope I could give you a few things to think about, and maybe you are now more aware but also less scared of errors.

Bye!

Chris