cron
is a very powerful task scheduling tool in Unix environments. When setting up periodic tasks that need to run on a fixed interval, it’s usually the first tool a developer reaches for.
Rails, unfortunately, does not always play nice when doing this.
A common workflow you might have a need for is schedule periodic background jobs in a framework like Sidekiq. You would be tempted to reach for cron in this instance, but you shouldn’t. There’s a latent performance issue awaiting you should you choose this path.
Case Study
Today we’ll look at a scenario in which this type of periodic job scheduling caused issues.
Setup
Let’s say we want to schedule a handful of jobs that run on a couple different intervals shown in the table below. I’ve included the equivalent number of minutes for each interval (important later) as well as an example crontab for the job.
Job # | Interval | Minute Equivalent | Example Crontab |
---|---|---|---|
#1 | every minute | 1 | * * * * * |
#2 | every 5 minutes | 5 | */5 * * * * |
#3 | every 10 minutes | 10 | */10 * * * * |
#4 | every hour | 60 | 0 * * * * |
#5 | every day | 1440 | * 12 * * * |
In Rails, we can leverage the whenever
gem to make building out these crontabs a little easier. It’s very straight-forward: it parses the duration you give it and turns this into crontabs (shown above) that run the jobs you specify.
All we need to do is push the job into a background processing framework – Sidekiq, leveraging perform_async
, in this example. We don’t need to concern ourselves with running the job synchronously.
A sample periodic job schedule might look like this:
# config/schedule.rb
every 1.minute { runner "PeriodicJob1.perform_async" }
every 5.minutes { runner "PeriodicJob2.perform_async" }
every 10.minutes { runner "PeriodicJob3.perform_async" }
every 1.hour { runner "PeriodicJob4.perform_async" }
every 1.day, at: "12:00AM" { runner "PeriodicJob5.perform_async" }
The Problem
The big thing to notice is that each of these blocks call runner
. This is shorthand for essentially just rails runner
, which spins up a Rails application and runs whatever code you give it.
There’s a multitude of problems with that:
- It spins up the entire Rails stack every time it’s kicked off, since cron is an OS-level scheduling tool that has no context of application code.
- All
perform_async
does is throw a job into the Sidekiq queue (a very lightweight operation otherwise). It doesn’t actually do any processing. - All the schedules overlap at some point.
Problem 1 and 2 are intuitive and go hand-in-hand: It’s very excessive to boot Rails just to insert a record into something like Redis.
Problem 3 is a bit more sinister. It’s unlikely that a single periodic job in your application causes major issues. But as your application grows, so too is the likelihood that you’ll want even more periodic jobs, each of which have potential to need to run at the same time. If you’re not very careful, you’ll run into a thundering herd problem quickly.
Analyzing the Herd
In our case, analysis is easy. cron in most cases can only operate on 1 minute intervals (i.e. every minute) and can’t handle fractional minutes (i.e. seconds).
Notice also these intervals have a cyclic relationship! For any set of jobs in this type of schedule, it is cyclical with a period equal to the least common multiple (LCM) of the jobs’ intervals. Let’s draw that out so it’s clear:
Job #s | Minute Equivalents | Period | # Concurrent Jobs |
---|---|---|---|
1, 2 | 1, 5 | 5 minutes | 2 |
1, 2, 3 | 1, 5, 10 | 10 minutes | 3 |
1, 2, 3, 4 | 1, 5, 10, 60 | 60 minutes | 4 |
1, 2, 3, 4, 5 | 1, 5, 10, 60, 1440 | 1440 minutes | 5 |
As you can see, at each increasing interval we run N + 1 jobs, increasing the number of concurrent jobs at each step!
Very quickly this can start to become an issue where booting copies of your application to run these jobs can outcompete all the other processes on the machine for resources and cause all sorts of havoc. This is especially true if you spin up the jobs individually rather than combine them in some sort of task.
Here’s a plot of CPU load with respect to time for a system that’s affected by this problem. Notice load peaks every 5 minutes, ~2x load at 10 minutes, and ~3x load every 60 minutes. The baseline load here is actually also sawtoothed like this due to jobs running on the 1 minute interval, but not visible at this scale!
Fixing the Issue
If you must rely on cron to schedule these jobs, then your best bet is to maximize the LCM of the intervals of your schedule where the interval is larger than 1 minute. This will reduce the likelihood of collisions between different intervals, but won’t completely remove it. That’s a lot of work, though, and the problem with that is you wouldn’t be able to specify what you want your intervals to be, which can be a problem if your jobs aren’t tolerant to timing changes. Plus, you’d also need to optimize all permutations of the set of iterations… This problem gets hairy rather quickly.
In reality, a more useful solution is to simply swap to a different scheduling tool. For Sidekiq, this means paying for Enterprise edition to get 1st party periodic jobs support, or using the sidekiq-cron
gem.
Both of these tools implement a polling approach that works within the existing Sidekiq process, avoiding the need to spin up a new Rails process to kick off the next job. Take care to configure the polling parameters such that your jobs still execute in the desired timeframe.
That’s all for now. Thanks for reading!