Magicians never share their secrets. But we do. Sign up for our Ruby Magic email series and receive deep insights about garbage collection, memory allocation, concurrency and much more.
We’ve made a few changes to our push API today and it almost went well. In the old version we relied on a MongoDB connection to check the authentication of the gems pushing data to AppSignal. It basically asked MongoDB if there was an account with the API key and if it was active.
In addition to a MongoDB connection we also have a connection open to Redis for our queue. We wanted to remove MongoDB as a dependency for the push endpoints so we implemented a way to ask for valid API keys from Redis instead of MongoDB.
Unfortunately the deploy of this new version went less than perfect. During a brief time (less than a minute) the push endpoint was returning 401 status codes instead of accepting the API key.
This originated from our switch and the Redis store wasn’t populated fast enough so the push endpoint couldn’t fetch the API keys.
The graph above shows the result of this mishap. Our gem is configured to disengage and shut down if it gets a 401 status code, so it doesn’t hammer our API.
When a new Passenger thread is started the gem tries again and that’s why you can see the requests slowly coming back to the original level.
If you use a server solution that is not Passenger and has long lived processes it is possible that data is not sent to AppSignal until the gem is re-started again.
While only a very small number of our customers are affected, it’s still too much and we’ll do the following to prevent this in the future.