A common frustration with software engineering docs is that they feel like a waste of time because they often go unused.
But there are certain pieces of knowledge that are useful and actionable to your teammates and to your future self.
Actionable docs include:
- Code commands
- Post Mortems
After talking to dozens of engineering teams, we’ve found these four strategies help engineers write actionable docs that people actually use.
Use standardized templates.
Standardized templates make it easy for engineers on one team to dive into the work of engineers on another team. Without a standardized template, each engineer first needs to spend time getting acquainted with how the author organizes information. This wastes time and adds friction to reading.
With standardized templates, the reader knows exactly which section to jump to.
Some templates for getting started:
- Post-Mortem Template by PagerDuty.
- Operations Runbook Template by AWS.
- Architecture Decisions by ThinkRelevence.
Write prominent ready-to-copy commands.
Engineers typically read context once in a while but refer to commands frequently. As such, commands should be prominently displayed and easy to copy-paste as plaintext.
Record outage summaries.
The goal of recording outage summaries is to prevent similar outages from happening in the future.
First, analyze the outage.
Many companies follow the Five Whys approach to find the root cause of an outage.
Using the Five Whys approach, teams ask “why” repeatedly until they find the root cause. Here is what this approach might look like for an application outage.
- Users could not create blog posts on our app for five minutes.
- The server could not write to the database.
- The server did not have permission.
- New permissions were applied to the server that were missing write permissions.
- Human error [Root Cause].
Then, make your findings actionable.
Now that we’ve gotten to the root cause of the issue, human error, we want to make these findings actionable.
We should write a record that captures:
1. the root cause.
2. any improvements we plan to make in light of what we’ve learned.
Outage: Permission for server to write to database was removed by human error.
Moving forward, we’ll use a pull-request check for all AWS permission updates.
Outage summaries and action items that are captured concisely are easier to read today and easy to refresh on later.
Record key decisions.
Like outages, key decisions should be summarized for posterity.
What makes a decision a “key” decision?
When evaluating if a decision should be written about, consider:
- Were alternatives seriously considered when making this decision?
- Is this decision difficult to change?
- Is this decision non-obvious? I.e., do teammates require context to understand why this decision was made?
Recording decisions with the SPADE framework
Decisions are another area where following standardized templates can be helpful.
One example comes from Gokul Rajaram’s SPADE framework for documenting decisions.
SPADE stands for: Setting, People, Alternatives, Decide and Explain.
S — Setting
The context in which the decision is made. It should include any constraints that we are working with.
P — People
List anyone who is involved in making the decision. This includes the main driver of the decision, the approvers, and anyone who is consulted.
A — Alternatives
List all alternatives considered, and their pros and cons.
D — Decide
The substance of the decision.
E — Explain
Analysis that led to this decision.
The SPADE Framework, Applied.
Let’s walk through an example:
Suppose Jane, a software engineer, is deciding which technology to use for a new software service.
S — Setting
We need to choose a tech stack for a new software service. We will be hiring two new engineers to build it. It needs to be production-ready within 6 months.
P — People
Jane is the primary decider.
Ellen is the approver.
A — Alternatives
Java —statically typed and less error-prone, more challenging to hire for.
Python — more error-prone, easier to hire for.
D — Decision
We will be building our app in Python.
E — Explain
Given the time constraint, and that we accept the risk of shipping minor bugs, we have chosen Python because it is easiest to hire for.
Recording the decision in this standardized way keeps the team better informed and provides a reference that can help when making future decisions.
Writing Actionable Engineering Docs
By applying these four strategies, you can make it easy for your teammates and your future self to quickly dive into your projects.
Looking for an actionable docs tool? Check out Bytebase, the byte-sized knowledge base.
Bytebase makes it easy to create and organize short “bytes” of knowledge like commands, outage summaries, and decisions. Email me with subj ‘Actionable Docs’ for early access.
At 3:30 in the morning of January 10th, 2008, a shrill chirping woke up our system administrator, Michael Gorsuch…
Post-Mortem Template - PagerDuty Incident Response Documentation
This is a standard template we use for post-mortems at PagerDuty. Each section describes the type of information you…
Enable consistent and prompt responses to well understood events by documenting procedures in runbooks. Runbooks are…
Blog | Documenting Architecture Decisions | Relevance
Tags: agility and architecture Architecture for agile projects has to be described and defined differently. Not all…