Checking Links in Docs-As-Code Projects

Creating content requires accuracy as well as creativity and the ability to deliver. Working with docs-as-code gives a strong foundation and structure to work within, and means there’s a clear workflow where automation can be added to help us with the easy stuff, such as “do all these links work?”.

I really appreciate having the extra confidence and support of these types of tools, and having implemented them on multiple projects at this point, I thought I’d share my advice for anyone looking to do the same.

Pick a tool

Before you go on a hunt for the best link checker ever, check what your tools already have included. For example there is a Sphinx linkcheck builder.

There are also some decisions to make, where the right answer will depend very much on your situation and context.

  • Should the tool check links to external sites, or only internal links? I prefer to check internal links only, so that other people’s downtime doesn’t make the builds on my own projects fail.
  • Will the link tool operate on the source files (markup) or output (website)? I’ve used both approaches and both are valid. I usually check the raw format (ReStructuredText, Markdown, etc) if I can, because it means the links can be checked without waiting for a build to run, and because the problems are reported in the file where I need to fix a thing, rather than in the output file. However checking links after build checks the rendered output and that can give better results in some setups.
  • If the checks fail, can the user still deploy the site if they want to? I always want to be able to deploy, including if I can see that the build failed. The humans have more context than the machines do.

On my current project, I’m using mlc which is a Rust-based tool that can check links in markdown format (I use it mostly with --offline to check internal links only – our setup makes it easy to move a file and break the links, so it’s nice to have the tool check we’re not making mistakes. It also runs very fast in offline mode). Once you know what you’re looking for, you’re ready to start to find something that works for your setup.

Configure what you need, and nothing more

It’s tempting to turn on the best link checking and to make sure that your project is the absolute best it can be. However, especially if you’re adding tools to an existing project, it’s better to start small. The link checking should run as part of the continuous integration pipeline, and best practice there is never to ignore errors. If you can safely ignore something, why are you checking it at all? (this is also why I don’t bother with warnings in build pipelines – either it’s an error and you should fix it, or nobody cares and you can turn it off).

Tune the configuration of the link checker to suit your needs. If there are problems in some areas of the site, exclude them from the checks and work through fixing and re-introducing them one-by-one. Don’t be afraid to limit the checks in scope or thoroughness, whatever checking you enable will deliver benefits, and there’s a risk of blocking other people’s work or discouraging them from contributing.

You may find that your project already has some broken links, that’s fine (in fact, expected if you didn’t already have checkers in place!). Go ahead and make the project pass the checks at the same time as you introduce the checks.

Make every CI tool available locally

Whatever tools are run in CI should be available to authors when they’re working locally to develop content. Document which tools are used, how to configure them, and how to install and use them locally (or link to the project documentation if appropriate). This way your contributors can get a quicker feedback loop when fixing a problem, and also run the checks as they go along.

Add the tool to the CI setup

Most tools have a configuration file, and I prefer to use that over passing lots of arguments on the command line. Then the tool can be configured the same way for local and CI use very easily. It can also be good to wrap the command to do something in a make target (or whatever your build tool is), and to use the same target names across projects even if the tools vary. The consistent approach means the contributors have less to think about or remember.

Whether you use GitHub Actions, Jenkins, or something else completely, the basic approach to running checks in CI remains consistent:

  1. Check out the code
  2. Set up the tool you are going to use for checking
  3. Run the checks

For a specific example, here’s my GitHub action that uses mlc to check some Markdown docs:

name: Documentation tests
    types: [opened, synchronize, reopened]

    runs-on: ubuntu-latest
      - name: Checkout Repository
        uses: actions/checkout@v4
      - name: Markup Link Checker (mlc)
        uses: becheran/[email protected]
          args: ./docs

In this case, the tool has an action that I can use and that makes the steps simpler. The Sphinx tool I mentioned would have an action that looks more like this:

name: Documentation tests
    types: [opened, synchronize, reopened]

    runs-on: ubuntu-latest

      - name: Checkout Repo
        uses: actions/checkout@v4
          fetch-depth: 0

      - name: Set Up Python
        uses: actions/setup-python@v4
          python-version: 3.12

      - name: Install Dependencies
        run: |
          pip install --upgrade pip
          pip install --upgrade -r requirements.txt

      - name: Check Documentation URLs
        run: sphinx-build -b linkcheck

With the workflows in place, you can essentially forget about link checking. In theory, you can run it locally and check the links before you push your branch. In practice, I do this much less often than my good intentions wish I did – with the result that I’m always pleased and grateful when the build catches the thing I should have seen myself! The workflow will test the links, and the humans can get on and focus on the content.

Also published on Medium.

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

This site uses Akismet to reduce spam. Learn how your comment data is processed.