This is a continuation of my previous post about getting a job with Major League Baseball.

Today we’ll be delving into how I refactored the standings API for the mlb-led-scoreboard project when MLB deprecated that functionality.


Background

Major League Baseball has two APIs that power MLB GameDay:

  1. An older XML API that’s been deprecated
  2. A new JSON API called StatsAPI

mlb-led-scoreboard is written for Python 2.7 (there’s a development PR out there for Python 3 support). The API Wrapper the scoreboard uses is called mlbgame, which consumes the old XML API.

The MLB Scoreboard uses mlbgame to fetch all the data from the API that it will need to display all the fun stuff.

It might also help to understand how the MLB is structured (in 2021). There are two leagues (American and National League), and each league has 3 divisions (East, Central, West) for a total of 6. In a drastic oversimplification, a team’s standings within each division determines who qualifies to compete in the postseason and ultimately contend for the World Series. When I refer to “standings” throughout this post, I’ll be talking about the overall win/loss record for each team compared to other teams in the division. Here’s an example of what that might look like with the projected standings for 2021:

MLB Divisions


The Incident

The MLB XML API has been deprecated for a long time, so it’s expected that this will cause issues at some point. This is exactly what happened in July 2020 when the API lost the ability to display league and division standings.

This was incredibly unfortunate because the MLB scoreboard doesn’t really have the ability to turn off parts of the code it doesn’t use, it always pulls the same amount of data from the API it consumes and passes it to a Renderer, which decides how it gets displayed. That means it always tries to refresh the standings data, and when it can’t, it logs a debug error.

Lots of users noticed a Failed to refresh standings. error and loss of standings info in the normal screen rotation and raised the issue in the project’s Slack channel. Discovery and bug-tracking of the API issue resulted shortly thereafter.


Resolution

To fully understand the resolution, it will be helpful to understand a little bit about the architecture of the scoreboard. The main modules to note:

  1. data
    • Builds up a data structure that holds all of the things from the MLB API
    • Abstraction layer on top of mlbgame
  2. renderers
    • Consumes the data object along with user configuration
    • Places data elements onto the LED matrix
    • Abstraction layer on top of rpi-rgb-led-matrix
  3. main
    • Program entry point
    • Munges data and user configuration, then passes into the MainRenderer for hardware output over GPIO

Obviously, there’s a little more going on, but for this example this will be sufficient to understand.


Consuming mlbgame Data

In the entry point for the scoreboard, an initial load of all the data is performed before passing to the MainRenderer. This is the crux of the issue, as we’ll see very shortly:

# Create a new data object to manage the MLB data
# This will fetch initial data from MLB
data = Data(config)

MainRenderer(matrix, data).render()

The Data class houses a bunch of functions to fetch the data it needs directly from mlbgame. Here’s the relevant one for the standings data. This is where the error first gets picked up:

class Data:
  # Refresh standings data from mlbgame
  def refresh_standings(self):
    try:
      if self.config.demo_date:
        self.standings = mlbgame.standings(datetime(self.year, self.month, self.day, 23, 59, 0, 0))
      else:
        self.standings = mlbgame.standings()
    except:
      debug.error("Failed to refresh standings.")

From there, the MainRenderer tries to render it on the LED matrix by calling into the StandingsRenderer class. Notice it’s in a try / except block, so this error was an additional clue that something had gone wrong with standings.

class MainRenderer:
  # Render the standings screen
  def __render_standings(self):
    try:
      StandingsRenderer(self.matrix, self.canvas, self.data).render()
    except Exception as ex:
      # Out of season off days don't always return standings so fall back on the offday renderer
      debug.error("Could not render standings.  Falling back to off day.")
      debug.error("{}: {}".format(type(ex).__name__, ex.args))
      self.__render_offday()

And finally, the StandingsRenderer. I won’t get into a ton of what this does as it’s not important and most of it is positioning and formatting the text on the screen. A mockup of what’s happening would work best:

class StandingsRenderer:
  def render(self):
    for division in self.data.standings:
      for team in division:
        record = '{team}: ({wins} - {losses}, {games_back}GB)'.format(team=team.abbreviation,
                                                                      wins=team.w, 
                                                                      losses=team.l,
                                                                      games_back=team.gb)

        # Call out to matrix driver to put pixels on the screen
        graphics.DrawText(self.canvas, FONT, X_COORD, Y_COORD, STAT_COLOR, record)

Again, it’s not exact, but all of this gives us the shape of the data the scoreboard expects. From here, we can build some sort of data structure that holds the correct attributes such that it would require minimal, if any, changes to the renderers.


Probing MLB’s API

I could have used another API wrapper to fetch the data we need, but there’s two problems with that:

  • Now we have 2 API wrappers to maintain
  • It could be a large change to re-write the StandingsRenderer (the example code is far simplified)

Instead, I opted to build my own wrapper. From the code above, I knew the shape of the data structure I’d need:

Standings -> 
  [ Division <name> ] -> 
    [ Team <name, abbrev, w, l, gb> ]

Now, since there was no documentation available, I needed to probe the current MLB StatsAPI to figure out how to get the data I needed. I took some clues from another API wrapper (MLBStatsAPI), and managed to find the correct JSON endpoint:

https://statsapi.mlb.com/api/v1/standings

And after some trial and error, found the right parameters!

Standings for April 17, 2021

Fortunately, this endpoint contains ALL the data I need, so it’s a no-brainer to use at this point (there is a concern that future breakages won’t always have a JSON endpoint that maps directly to the data we would need, leading to additional challenge).


Writing a Standings Fill-In

Once all that was finished, it’s simply a matter of writing the code. Standings first. We fetch the divisions from the JSON returned in the records node.

class Standings:
    __URL = 'https://statsapi.mlb.com/api/v1/standings?season={year}&leagueId={league_ids}&date={month:0>2}/{day:0>2}/{year}&division=all'
    AL_LEAGUE_ID = '103'
    NL_LEAGUE_ID = '104'
    __LEAGUE_IDS = ','.join([AL_LEAGUE_ID, NL_LEAGUE_ID])

    @classmethod
    def fetch(cls, year, month, day):
        standings_data = requests.get(Standings.__URL.format(day=day, month=month, year=year, league_ids=Standings.__LEAGUE_IDS))

        if standings_data.status_code == 200:
            return Standings(standings_data.json())
        else:
            raise Exception('Could not fetch standings.')

    def __init__(self, data):
        self.__data = data
        self.divisions = self.__fetch_divisions()

    def __fetch_divisions(self):
        return [Division(division_data) for division_data in self.__data['records']]

And now divisions, which looks similar to the Standings class. This time, we’ll pull the name of the division (formatting it to what mlbgame would have given us), then get the teamRecords node out and construct instances of each Team.

class Division:
    def __init__(self, data):
        self.__data = data
        self.id = self.__data['division']['id']
        self.name = self.__name()
        self.teams = self.__teams()

    def __name(self):
        division_records = self.__data['teamRecords'][0]['records']['divisionRecords']
        full_name = [datum['division']['name'] for datum in division_records if datum['division']['id'] == self.id][0]
        
        # Use some regex to fix the division full name to what the config expects
        return re.sub(r'(ational|merican)\sLeague', 'L', full_name)

    def __teams(self):
        return [Team(team_data, self.id) for team_data in self.__data['teamRecords']]

Here, there’s a little extra. Because the MLB API doesn’t include the correct team abbreviations, they needed to be included here as a constant. Other than that, we’re just stuffing all the correct data into instance attributes so they can be used later.

class Team:
    __TEAM_ABBREVIATIONS = {
        'Arizona Diamondbacks': 'ARI',
        'Atlanta Braves': 'ATL',
        'Baltimore Orioles': 'BAL',
        'Boston Red Sox': 'BOS',
        'Chicago Cubs': 'CHC',
        'Chicago White Sox': 'CHW',
        'Cincinnati Reds': 'CIN',
        'Cleveland Indians': 'CLE',
        'Colorado Rockies': 'COL',
        'Detroit Tigers': 'DET',
        'Florida Marlins': 'FLA',
        'Houston Astros': 'HOU',
        'Kansas City Royals': 'KAN',
        'Los Angeles Angels': 'LAA',
        'Los Angeles Dodgers': 'LAD',
        'Miami Marlins': 'MIA',
        'Milwaukee Brewers': 'MIL',
        'Minnesota Twins': 'MIN',
        'New York Mets': 'NYM',
        'New York Yankees': 'NYY',
        'Oakland Athletics': 'OAK',
        'Philadelphia Phillies': 'PHI',
        'Pittsburgh Pirates': 'PIT',
        'San Diego Padres': 'SD',
        'San Francisco Giants': 'SF',
        'Seattle Mariners': 'SEA',
        'St. Louis Cardinals': 'STL',
        'Tampa Bay Rays': 'TB',
        'Texas Rangers': 'TEX',
        'Toronto Blue Jays': 'TOR',
        'Washington Nationals': 'WAS',
    }

    def __init__(self, data, division_id):
        self.__data = data
        self.__division_standings = self.__find_division(division_id)
        self.name = self.__name()
        self.team_abbrev = self.__TEAM_ABBREVIATIONS[self.name]
        self.w = self.__parse_wins()
        self.l = self.__parse_losses()
        self.gb = self.__data['divisionGamesBack']

    def __find_division(self, division_id):
        for record in self.__data['records']['divisionRecords']:
            if record['division']['id'] == division_id:
                return record

        raise Exception('Could not find division record.')

    def __name(self):
        return self.__data['team']['name']

    def __parse_wins(self):
        return self.__division_standings['wins']

    def __parse_losses(self):
        return self.__division_standings['losses']

Usage

If you noticed above, we put a lot of things into instance attributes. This was intentional, since mlbgame also structures the data this way. We can traverse through the Standings object in exactly the same way we would through the equivalent mlbgame standings object:

standings = Standings.fetch(2021, 4, 17)

for division in standings.divisions:

  print('-------')
  print(division.name)
  print('-------')

  for team in division.teams:
    print(team.name)

  print('\n')
-------
AL West
-------
Los Angeles Angels 
Houston Astros 
Seattle Mariners 
Oakland Athletics 
Texas Rangers 

. . . And so on . . .

Because this behaves exactly like an mlbgame standings object, we no longer have to re-write any code related to rendering the data. All we have to do now is to import it into the main Data class and update the refresh_standings() function:

from standings import Standings, Division, Team

class Data:

. . .

  def refresh_standings(self):
    try:
      self.standings = Standings.fetch(self.year, self.month, self.day)
    except:
      debug.error("Failed to refresh standings.")

And DONE! That’s it – everything else just works!


Summing Up

I’m super happy with how it turned out. I think it was a clean solution to a dependency issue and I think it’s the pattern that we’ll follow in the future as things continue to break from the old MLB XML API.

Unfortunately, that’s not the end of the story as it was discovered that this code has a fairly major bug that either never got caught or was changed from last year! Either way, with the new pattern, it’s pretty simple to implement a fix, and hopefully within the next few days we’ll see a resolution pushed out.

If this interests you, I strongly encourage you to read Clean Architecture by Uncle Bob – This is exactly the kind of issue that the book covers regarding software architecture and minimizing maintenance time and effort.

Hope you enjoyed, thanks for reading.