All of the interesting problems in this blog come from the MLB LED scoreboard at this point I believe.

My role on maintaining the project is rapidly becoming the gatekeeper of “how do we make the application more configurable” while keeping in mind that we now have a huge permutation of configuration options across hardware we don’t control.

It is hard to complain, though. It really is an interesting problem to work with. For a little background, mlb-led-scoreboard is a Python-based scoreboard application that renders MLB data to LED matrices. The matrices are driven by an open-source C++ driver library. The users the team deals with are tinkerers, but not necessarily developers, so there is an interesting mix of skillsets at play (a common problem is misunderstanding git workflows). There is a little bit of effort to get a new scoreboard up and running. We don’t provide a binary directly – clone the repo, run the bundled shell scripts, set up a systemd service if you want, and you’re off to the races.

Fundamentally, we want to design components that let the tinkerers build exactly what they want with the least friction.

Current State

We have three main sources of configuration right now:

  • A global configuration
    • Controls things like what your preferred team is, whether you want news tickers or weather to display when games aren’t on, how fast games rotate between each other, etc.
  • Color configurations
    • Controls the color of almost every text or graphics object.
  • Coordinates configurations
    • Controls where an object is placed on screen (if at all). Some objects have additional toggles like abbreviating text if they can.
    • There is exactly one coordinate configuration for each size of matrix we support.

We ship this config as a “schema”. Heavy quotes here because I am NOT talking about JSON schema, but more on that later. A new scoreboard is immediately ready to run with default configs provided and configs are exposed to the Python script as a config object which performs some validations that the values make sense.

User-Editable?

Configuration is useless if you can’t change it. Each schema is defined as *.example.json, and can be overridden by copying that file without the example prefix:

# Copy the global config schema to a user config
cp config.example.json config.json

# Copy 32x32 coords schema to a user config
cp coordinates/w32h32.example.json coordinates/w32h32.json

# ... You get the idea

The application will prefer the non-example configs when reading them in.

Validating a Config

If you’re letting folks change something, you need to validate that what was changed makes sense. So why not JSON schema like I mentioned above?

Well, it’s really two things: I want to provide a default for every key, and I want it to be super simple for end users that might not be used to working with schemas. I don’t think the former is supported and for the latter, everyone can understand JSON so I shall respond with a meme:

Always has been JSON

Ok, so now we can provide a default config and it’s pretty usable. Now let’s add a new key/value pair to a config.

Can we even do that? In our pattern, we would add that to the schema (a *.example.json file). But there’s no mechanism to propagate that to a user config as that’s copied to a different file.

Enter the config validator: validate_config.py This unassuming file houses a recursive checker that does all sorts of things. Normally I would break this out into code blocks. I’d basically be doing that for the whole file here so I encourage you to just read it in full instead.

This file starts by loading any example schemas and their corresponding user-edited configs.

Then, stepwise:

  • Iterate over each key/value pair in the config and compare it to the schema.
  • If the value is a dictionary, recurse – iterate over that dictionary.
    • Anywhere in the recursive call:
      • If the key is in the config but not the schema, mark the key as dirty and removable.
      • If the key is in the schema but not the config, mark the key as dirty and addable.
      • If the key is not in the schema but is in the config AND the key is in a renamable list and the same key is in the config, mark the key as dirty and renamable.
  • IF the value in the config and schema are NOT a dictionary (even if they differ), this key is clean, so continue.

Then for each dirty key, write (or remove) that from the config. Once done repeat the process for each config.

Holy moly. This was probably maintainable at one point, but no longer. On the plus side, it does automatically change the config to match the schema, but on the other hand, it’s incredibly difficult to reason about. I’m looking at you, recursive compare-and-rename from hell.

validate_config.py output

Pictured: renaming a super long key name to a slightly-less long key name…

And we’re not even done! After that, once the app loads, we will still need to actually validate the fields make sense because the validator only cares about presence!

Possible Future State?

I think the recursive validator is probably here to stay, as we need to make sure you haven’t broken the config too badly. Or maybe we move that to JSON schema at some point.

The interesting trick is adding/removing/changing a key in that schema. We need a way to perform those changes in a way that’s more easily reasoned about.

Data Migrations

I think the route I want to go is a migration framework, just like you would use in a relational database. Here’s the PR (though at time of writing, this is not yet merged.). I’m drawing heavy inspiration from ActiveRecord (Ruby) migrations because I think the CLI and migration files are slick.

Basically, I want a super simple DSL that generates safe, easy-to-understand migrations that we can run against arbitrary user-edited configs.

Here’s an example migration. There’s A LOT going on under the hood… rename_key() is a helper that knows how to move the key at the path to the new key name, and everything is wrapped in a “transaction” that rolls back the changes unless all changes complete and all files are successfully written.

from migrations import ConfigMigration


class KeypathRenameTest(ConfigMigration):
    def up(self):
        self.rename_key("path.to.the.renamed_key", "renamed_key2", self.configs["base"])

    def down(self):
        self.rename_key("path.to.the.renamed_key2", "renamed_key", self.configs["base"])

Here’s what an explicit transaction looks like. If you’re using the helpers, you don’t need to do this, but some advanced migrations might need it:

from migrations import ConfigMigration, Transaction

import json


class TestTransaction(ConfigMigration):
    def up(self):
        with Transaction() as transaction:
            for file in self.configs["base"]:
                with open(file, 'r') as f:
                    content = json.load(f)

                # Convert floats to ints
                if "rotation_rates" in content:
                    content["rotation_rates"] = [int(rate) for rate in content["rotation_rates"]]

                transaction.write(file, content)

    def down(self):
        # Destructive migration!
        raise IrreversibleMigration()

The CLI I’m designing is basic, but powerful:

# CLI entrypoint
$ python -m migrations -h
usage: __main__.py [-h] {generate,up,down} ...

Data migration manager for mlb-led-scoreboard configuration objects.

positional arguments:
  {generate,up,down}  Available commands
    generate          Generate a new migration file
    up                Run migrations
    down              Roll back migrations

options:
  -h, --help          show this help message and exit

# Up (forward) migrations
$ python -m migrations up -h
usage: __main__.py up [-h] [--step STEP]

options:
  -h, --help   show this help message and exit
  --step STEP  Number of migrations to process (defaults to all migrations)

This lets developers roll forward and backward (not shown) with ease. It generates ActiveRecord-style migration files which are Unix timestamped followed by the migration name so the filesystem sorts it for us – providing a simple way to see migration history.

All in all, it seems like a clean way to provide some much needed control over how our configuration works.

That’s all for now. Thanks for reading!