If you have some experience setting up monitoring for different setups, this post is for you. Since different parts of your architecture have different tasks, they also have different expectations. Today, we’ll take a quick dive into how to deal with that reality and set up monitoring for it.
Warning: In this post, you'll have to bear with our enthusiasm for setting things up perfectly.
Not All Parts Are Created Equal
In a setup with limited complexity, let's say with background jobs and a customer-facing app or web interface, we already observe different expectations. A background job can run for minutes before things are considered slow, but a customer will already have CTRL-W’d you away by then.
Similarly, for throughput, you might have totally different expectations for different URL’s that you get requests on. For instance, there might be API endpoints where you expect a much higher throughput than a normal web request.
Monitoring for the Differences
The key to being woken up when needed, and being able to sleep when you can, is having the wisdom to see the difference. For a good night’s sleep, we recommend setting up good monitoring. We have an idea of what that should be, but that isn't what this article is about. Just set up any monitoring.
Because we know AppSignal best, here are two ways you would use it to monitor for these differences.
The first way to separate throughput triggers is to set different namespaces, with high-throughput actions in a separate namespace (e.g. an api namespace, or integrations namespace). Read on how to set custom namespaces in our documentation.
The second option is to separate triggers using the “custom metrics” form on the page where you set a new Trigger. You can then use wildcards in the tags to match groups of actions. To do this for throughput triggers, the fields should be transaction_duration for the name, count for the field and the following headerType: legacy
When to Use Which Method
Either method is valid. We use separate namespaces ourselves. Having these actions in a separate namespace means that it is easier to treat not just the triggers, but the errors differently as well, and have them nicely structured in the AppSignal interface. That way, we can get a good separate overall view on things. API endpoints will not just have more throughput, but they also tend to generate more errors (invalid user data, etc).
That way, it is easier to, for example, treat errors on an API endpoint more like a fact of life. For API endpoints, we ourselves are more interested in the relative error rates (e.g. spikes) as opposed to other actions in the web namespace, where we’d like to have zero errors at all times.
Setting things up via Customer Metrics makes more sense if there are only a few actions where you only want to set different throughput triggers, but keep those in the same overview for errors with the rest of the application.
👋 If you liked this, take a look at other Ruby (on Rails) performance articles in our Ruby performance monitoring checklist.
Back to Curbing Our Enthusiasm
Thanks for bearing through our enthusiasm. As you can tell, it is these kinds of tweaks that still get us really excited when dogfooding our own solution.
If some enthusiasm stuck on you, you might want to try out AppSignal.