A new way forward for technical documentation? Part 1: the problematic status quo

Luc Perkins    Tuesday, January 21, 2020


I’ve contributed to many very large documentation projects in my day, including Riak, Kubernetes, Prometheus, Pulsar, Heron, BookKeeper, and several others. On all of those projects I worked as some combination of tech writer, web designer, and build system “plumbing” person (i.e. the person in charge of making sure that the documentation inputs are converted into the right outputs and that those outputs are shipped to the right place).

In other words, I’ve seen as much documentation sausage being made as just about anyone in our industry. And I hate to say it but my experience has led me to the firm conclusion that our documentation tools are fundamentally broken for many common use cases. In this post I’ll explain why I feel this way. In the forthcoming part 2, I’ll point to what I think could be a more sound path.

The current way of doing things

The lion’s share of documentation projects nowadays look something like this:

  • You have a directory full of markup files, typically Markdown these days but sometimes reStructuredText, AsciiDoc, or an analogous format.
  • You use a static site generator (SSG) to convert these markup files into rendered HTML. There’s a dizzying array of options here, including some “pure” SSGs like Jekyll, Hugo, Gatsby, and Gridsome as well as some documentation-specific tools like Sphinx and Docusaurus, with new ones being created every day.
  • In addition to markup files, you might sprinkle in structured data like YAML or JSON to generate, for example, release matrices or OpenAPI docs.

A setup like this is usually perfectly adequate if your project is an isolate and not deeply interlinked with other projects. But this approach runs up against stiff limitations if you try to go beyond this.

Where the status quo comes up short

Existing tools are fine if your project is, well, just one project. But the status quo begins to break down when you need to think about information from a supra-project perspective. Here are some example scenarios where existing tools struggle:

  • Dense ecosystems — Think of a project like Apache Kafka. Kafka itself is quite a large project with many moving pieces: multiple APIs, subprojects like Streams and Connect, numerous client libraries, and much more.

    To get an ever better sense of the problem, take a look at the ecosystem page linked in the main docs sidebar. You’ll see dozens of related projects. Outside of what’s listed here, that are hundreds if not thousands of Kafka-related projects. On one hand, this is quite encouraging. What a vibrant ecosystem! But if you drill down into these related projects, chances are quite high that you’ll find tons of information that’s stale or redundant. Or stale and redundant!

    Why? Because each project is responsible for maintaining its own sources (i.e. its own directory tree full of doc inputs). If an “upstream” change is made in the main Kafka project—a core API gets changed, a version gets bumped, an API endpoint gets deprecated—there is no mechanism in place beyond human vigilance to make that change ripple across docs through the ecosystem.

  • Large organizations — If you’ve ever worked at $BIG_CORP then you probably already know where I’m going with this. In large orgs, teams and even individuals need to be able to create and maintain their own documentation projects. But as new projects are added, the information available proliferates in a byzantine, rhizomatic fashion. And this happens even though the core mission of the org implicitly ties all the projects together into what should be a unity. Tools like Confluence are good for empowering people to create an update projects—WYSIWYG editor, permissions, etc.—but what happens when you have 250 Confluence projects? How do you know which project might have the info you need?

The essence of the problem: decentralization

At this point, I hope to have made it clear that the core problem is lack of centralization in information management. When sources of truth proliferate, disorganization reigns and the only tool we currently have that can curb this is human vigilance—and we all know that building on vigilance alone is building on sand.

We need new tools that are built with the right foundations and the right pain points in mind. In part 2, I’ll provide initial sketches for what a new foundation built on the right principles would look like.