Not so long ago we noticed that our MongoDB servers were running out of disk space at an alarming rate. Because we host our database on SSD enabled servers at DigitalOcean, scaling up could cost us a lot of money.
Our data model
We receive a lot of log entries from our clients. These sometimes contain a lot of data, like params and instrumentation for very large actions. For some clients the average size of a log entry is around 400KB. You can imagine that when we store 70,000 of those, it adds up.
Compacting
To make things even more complicated, we "compact" the entries depending on their age. For example, after a week we take all the entries in any one hour period and compact those into one entry. This means that we remove all the entries for that one hour period, except for the slowest.
This method saves us a lot of disk space, but the downside is that we create fragmentation in the data store, leaving gaps that may or may not be filled with new entries.
The result is that we had 14GB of data, but used almost 60GB on disk! That's a lot of overhead.
The documents have a wide spread of sizes and MongoDB has a hard time fitting entries into the gaps. This in turn bumps up the padding factor, because MongoDB adds padding after each document so it can grow without having to move the document to a new space on disk.
Compressing
We realized that we have a lot of data that we don't query on and is only shown in the front-end when a single request is examined, such as parameters, environment and backtraces for errors. If we could compress that data, we keep the document size down and have a more narrow distribution of document sizes.
Zip it!
We started zipping the fields we don't query on by converting them to JSON, Zip them with Zlib and store them as Binary BSON. The results are astounding: a collection that was 2GB and had an average document size of 400KB was brought down to just 121MB and 20.4KB after compression.

Not only does this save us a ton of data on disk, our internal bandwidth usage has dropped significantly as well.
Wondering what you can do next?
Finished this article? Here are a few more things you can do:
- Try out AppSignal with a 30-day free trial.
- Reach out to our support team with any feedback or questions.
- Share this article on social media
Most popular AppSignal articles
 - Easily Monitor Multiple Heroku Apps with AppSignal- You can now monitor multiple Heroku apps from a single AppSignal instance. See more
 - Fine-Tune Your Charts with Minutely Metrics in AppSignal- Discover how minutely metrics in AppSignal deliver precise performance monitoring. Check out detailed performance data, spot anomalies quickly, troubleshoot issues more efficiently, and optimize your application's performance. See more
 - Secure Your Sign-Ins with AppSignal's Single Sign-On- Secure team sign-ins and enhance access management with AppSignal's Single Sign-On Business Add-On. Integrate AppSignal with your identity provider for seamless, secure access management. See more

Robert Beekman
As a co-founder, Robert wrote our very first commit. He's also our support role-model and knows all about the tiny details in code. Travels and photographs (at the same time).
All articles by Robert BeekmanBecome our next author!
AppSignal monitors your apps
AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

