What can studying climbing accidents teach us about coding? It turns out a lot.
As many of you may know I do a lot of outdoor pursuits - rock climbing, ice climbing, mountaineering, backcountry skiing, trail running. I am part of this club called the American Alpine Club and every year they publish "Accidents in North American Climbing" (ANAC). It's small book that lists all the reported accidents and near misses from the previous year along with some analysis. The goal is to make the climbing community more aware of what can go wrong, so hopefully we can make better decisions and be safer in the mountains.
What does this have to do with coding?
There are several lessons we can learn from this. The first is that sharing our failures, near misses, etc can be very useful to the community. We can draw insights and trends from those and learn how to protect ourselves better and prevent similar types of mistakes in the future. It also helps to normalize the fact that we all make mistakes. This helps to fight the "I'm so smart and take all the precautions, so it couldn't possibly happen to me" syndrome that is prevalent in both climbing and coding.
There is an attempt to do this that shows up regularly on social media. There are frequent posts of "Talk about a time you brought down production." to try to normalize the idea that we all make mistakes. What we are lacking in the software world is some central collection point for these stories along with some analysis so that we can learn some of these lessons. It is challenging because so much software is proprietary and under NDA and companies don't want to talk about their failures. If we could figure out how to create something like ANAC for coding, it would be a great benefit to the entire community. As a start, if you work in a large company I encourage you to consider a way to track these things internally. I will caution that you want to have a culture of safety first otherwise, it will go off the rail quickly. With the right culture, it could work.
Be Wary During Transitions
Every year ANAC picks out one trend and does a short article on it. The goal of the article is to make climbers more aware and to point out common mitigations and risk management strategies. This year the topic was transitions. It listed a bunch of various transition points such as: changing mediums - rock, snow, ice; changing weather and temperatures; changing from ascent to descent; belaying to climbing; going from climbing in a gym to climbing outside; day to night transition; unroped climbing to roped climbing and a whole bunch more. The main point of the article was that at each of these transitions, we need to slow down and re-evaluate what we are doing, because our underlying assumptions may no longer be valid. We need to account for that in our decision-making.
Common Software Transitions
How does this apply to software? Well, there are lots of transitions in the software world and the same wisdom applies. As I wrote about in a previous article the techniques that work in one context do not necessarily work in all contexts. They have some underlying assumptions and when we go through a transition we have to take the time to revalidate those assumptions. The previous post I linked to talks about the various types of software, but the transitions I'm talking about are even more commonplace and more subtle than going from writing a hit game to designing some IOT device.
It would be an interesting exercise to go through the Twitter post above and see how many of these "mistakes" are tied to a transition of some kind. Here are just 2 responses. Can you note the transition in these posts?
It may be easy to blame individuals, but once you start looking at the bigger picture and notice that all these issues happen around transitions, then it becomes apparent that the individual is not to blame as much as the system - for not recognizing the transition and adapting appropriately.
Here are just a few transition points I came up with off the top of my head. At each of these points, the environment around the project undergoes a step change and we need to make sure we notice and respond appropriately.
- prototype to production
- internal tool to open source tool
- inheriting code
- lose or gain a team member
- adopting a new framework or technique
- changing the underlying tooling i.e. moving from GitHub to GitLab
- version changes to underlying frameworks or languages
- OS upgrades
- change in user behavior (using the tool in ways you didn't expect).
- product lifecycle - going from adding new features into maintenance mode
- regulatory changes
- company gets bought out, or new management comes in
- switching to a different programming language or paradigm
If you are going through any of these, you need to slow down. Look at your current process and identify all the assumptions you are currently making. Do those hold under the new paradigm? What needs to shift? Make sure that you adjust your process to fit the new paradigm.