Visit AppSignal.com

Realtime graphs from MongoDB with aggregations

Robert Beekman on

In AppSignal, we process a ton of data: between 60 to 100 million entries per day. We aggregate this data with only few seconds delay. Let’s see how we did this.

This data consists of performance measurements and application traces. To generate our graphs we use MongoDB’s Map/Reduce feature. This turns millions of documents into only a few hundred per day.

For efficiency we “cascade” our Map/Reduce jobs. First we Map/Reduce or “bucket” data to minutely graphs. Then we reduce the minutely graphs to hourly graphs and so on. This way each Map/Reduce job has less and less documents to process.

The downside is that our jobs take a while to finish and so our graphs are delayed by a few minutes.

Making it even faster

We’ve decided that we wanted the graphs to be as real-time as possible so we either had to speed up the Map/Reduce jobs (which we did to an extent, but we were still a few minutes delayed). Or figure out a way to use our “bucket” data to fill the gap between the last Map/Reduce and “now”.

Luckily for us MongoDb has an awesome new feature where you can use an “aggregation pipeline” to process data without running a Map/Reduce job.

On to the code

First we have to select the right data. We only want a graphs for a specific site after the last Map/Reduce run.

1
2
3
4
selector = {
  's' => site.id,
  't' => {'$gte' => last_mapreduce_run_at }
}

We have to group this data by the minute (our highest resolution is minutely) and sum the troughput and durations

1
2
3
4
5
grouper = {
  '_id' => {'s' => '$site_id', 't' => '$time'},
  'troughput' => {'$sum' => '$c'},
  'duration' => {'$sum' => '$d'}
}

Lets also make sure the data is neatly ordered by time

1
sorter = {'_id.t' => 1}

Now we can aggregate the data:

1
2
3
4
5
Counter.collection.aggregate([
  {'$match' => selector},
  {'$group' => grouper},
  {'$sort' => sorter}
])

With this data we can calculate total throughput for a site and average response times.

Of course we do a lot of other stuff in this aggregation like calculate queue times, percentiles/deviations etc.

We’ve made sure our “realtime” Class takes exactly the same in- and output. This gave us the chance to swap them out at random and see how the aggregations performed.

Downsides

While aggregations are great for real-time processing, there are a few caveats that you have to keep in mind.

But for the few minutes we have to calculate between the last Map/Reduce and now (usually below 5 minutes) it works very well. Our customers are now able to spot changes in their app’s behaviour even faster, making our app more valuable.

10 latest articles

Go back

Subscribe to

Ruby Magic

Magicians never share their secrets. But we do. Sign up for our Ruby Magic email series and receive deep insights about garbage collection, memory allocation, concurrency and much more.