This post was updated on 9 August 2023 to include changes to the axios
library — removing the install of @types/axios
.
In 2009 Node.js opened up a door for front-end developers to dip their toes into the world of servers without having to leave the comfort of their language.
It's almost effortless to get started with Node. You can basically copy-paste an entire HTTP server into existence and then install an ODM and you've got your CRUD app ready to roll!
You could even add a few lines and have your favourite application monitoring tool for Node.js added in a few minutes. However, if we've learned anything from the amazing Spider-Man, it's that with great power, comes great responsibility.
So, in this article, we're going to discuss how you can wield your Node-given powers responsibly, and design servers that don't just work, but are also resilient and adaptive to failures.
Resiliency and Chill
One of the biggest names in the industry when it comes to server resiliency design is Netflix. They are extremely dedicated to designing robust systems that will serve us all seasons of Grey's Anatomy any minute of the day!
But what is this "resiliency" anyway?
Well, resiliency is just a fancy word for the ability of your system to recover from failures and continue operating.
If the power goes out and it continues to work, your system is resilient. If there is an equipment failure and the system keeps on going, it is even more resilient. If you hit it with a baseball bat and the system is still up... you get the idea.
However, in our case, we're more interested in providing API resiliency. So, let's see how we would identify a resilient API. What are some of the core principles of a resilient API?
Well, let's learn from the pros. Let's see what Netflix has to say about it.
Netflix defines the principles of resiliency as follows:
- A failure in a service dependency should not break the user experience.
- The API should automatically take corrective action when one of its service dependencies fails.
- The API should be able to show us what's happening right now, in addition to what was happening 15-30 minutes ago, yesterday, last week, etc.
They are also responsible for fault tolerance libraries and sophisticated tools for dealing with latency and fault tolerance in distributed systems.
To deal with the problem of fault tolerance, most of these solutions use a popular software design pattern called circuit-breaker, which is the exact pattern that we're going to be discussing in detail in the upcoming sections.
The Circuit Breaker Pattern
The Circuit Breaker in software design is named after it's equivalent in electrical engineering, where it serves as a switch designed to stop the flow of the current in an electric circuit. It is used as a safety measure to protect the circuit from overload or short circuit.
Circuit breakers come in all shapes and sizes, there are some that reset automatically, some that need to be reset manually, but they all essentially do the same thing — open the circuit if there's trouble.
The Circuit Breaker was popularized by Miachel Nygard with his book Release It!, where he describes this pattern along with other useful information about architecting resilient and performant software.
So if the electrical circuit breaker manages the flow of current, what does it's software equivalent do?
The circuit breaker manages the flow of requests to an upstream resource.
Let's think of the upstream resource as a remote server for the time being, but it is certainly not limited to being that. Circuit breakers can also be used locally to protect one part of your system from failure from another part.
The circuit breaker monitors for failures, and when the failures reach a certain threshold, it trips and none of the successive calls will be forwarded to the upstream resource.
Why Would We Bother Using a Circuit Breaker?
With the rising popularity of microservices, it is common for apps to make remote calls to other apps running on different processes across a network. It is often the case that the system is spread out across multiple machines as well.
Some of these services act as dependencies for others, and it is not unusual to have multiple dependencies upstream.
Even if we forget about microservices altogether, think about how common it is for applications to make remote calls. It is almost unavoidable that it will have integrations and will rely on upstream resources.
Another popular case is an API gateway, where a service's primary purpose is to proxy requests upstream. In this case, the health of the application is very closely tied to the health of the upstream resource.
So, we have all these cases where requests are being passed upstream, but why use a circuit breaker? And why don't we just let the request fail at its own pace?
Preserve Resources
Wasteful calls pile up on the upstream resource which might be already struggling with serving previous requests, further escalating the problem.
Wasteful calls can also be a big problem for the service making those calls.
Resources such as threads might be consumed while waiting for the upstream resource to respond, which can lead to resource exhaustion.
This can in turn lead to the service being unable to handle other requests.
So, wasteful calls can bring down services, and the failure can cascade to other services throughout the application.
Fail Fast
Imagine you're throwing a party on a Saturday evening. You're making preparations, sending invitations to all your friends.
Would you prefer them to respond instantly, or would you prefer them to respond the day after the party?
I know, I'd go with option one.
We want responses fast so that we can adapt to them even if it means not getting what we asked for.
This concept in systems design is called failing fast.
Fail Proactively
When upstream resources give us lemons, we make lemonade.
You might not be able to prevent upstream failures, but you can always manage them proactively, and make the most out of what you got.
Here are some common solutions to improve the failure:
- Fallbacks - in certain cases, you might be able to fall back to another service.
- Defaults - in certain cases, the integrity of the data is not crucially important, and defaults serve a good enough purpose until the upstream resource recovers.
- Cache - you can serve cached requests until the upstream resource recovers.
Avoid Polluting the Logs
Your monitoring solution is one of the most important components of your system. Without it, you're completely blind to what happens inside the dark realm of containers and Linux servers.
Metrics and logs are your eyes and ears. And the better the quality of the data you collect, the better you're able to understand what happens with your system.
If requests keep failing and you don't have a system in place that handles the situation gracefully, it will end up pumping ungodly amounts of pollution into your monitoring.
Circuit Breaker States
The circuit breaker has 3 main states which give us a clue about the health of the upstream resource or endpoint that we're targeting.
- Closed - the closed state means that the circuit is closed and everything is running smoothly. Just like in the case of an electrical circuit.
- Open - this state means that there is currently no connection upstream. In the case of an electrical circuit, if it is open, electricity cannot make its way through it.
- Half Open - the half open state means it has experienced difficulties reaching the upstream resource, but it's now testing the waters with new requests to see if it can stabilize. If it does, it goes to the closed state, if requests fail, it opens the circuit again.
Even though these are the conventional names of circuit breaker states, I prefer not to use them because I find them deceptive and can be misleading for developers.
When people see Open they're intuitively associating it with OK, and Closed sounds a lot like something went wrong.
What I prefer to use instead are colors e.g. Red, Yellow, Green or descriptive names like Failing, Stabilizing, OK.
So, for this demonstration, we're going to use colors to describe states, but remember, this is just personal preference!
Creating Your Own Circuit Breaker
There are plenty of libraries out there that we could use to implement our circuit breaker, but that would beat the purpose of the article since our goal is to understand how the circuit breaker pattern is implemented.
So let's reinvent the wheel to learn how the wheel works.
What we are going to code:
- The simplest Express.js server to act as our upstream resource and simulate succeeding and failing requests.
- A configurable Circuit Breaker class that uses the Axios library to make requests, and has basic logging capability.
- A few lines of code where we make use of our Circuit Breaker.
We are going to use TypeScript to implement these features.
So, let's dive in!
The first thing we want to do is to navigate to an empty directory of our choice,
which will be our work directory, and execute the npm init
command.
Once we have the package.json
file, it's time to install our main dependencies.
Since we're using TypeScript, we'll also need some dev dependencies, so let's install those as well.
Next, we're going to need a tsconfig.json
file to hold our TypeScript configuration.
You can use the one below.
Great, now our work directory should contain a node_modules
directory
and three files: package.json
, package-lock.json
, and tsconfig.json
.
It is time to copy-paste a basic Express server into existence.
Create a file called index.ts
and paste the following lines of code into it.
The above code snippet summons a simple express server that will be listening to GET
requests on
localhost:3000, and randomly failing with status 400
or responding with status 200
.
We'll be able to use this endpoint to test our Circuit Breaker.
Before we go further with the implementation, let's add a couple of convenience scripts to
our package.json
file so that we can build and start the server using npm commands.
In the scripts section of your package.json, copy and paste the following:
... "scripts": { "build": "tsc", "start-server": "npm run build && node build/index.js" }, ...
This will allow you to start your server with a simple npm
command.
Once the command is executed, the server should print "Listening at http://localhost:3000" to the console.
So far so good! Let's move on to the meat of the article, which is the Circuit Breaker itself!
Let's create a circuit-breaker
directory, which will contain all the assets related to the Circuit Breaker.
Now, let's navigate into this directory and start thinking about the components that we'll need to make the circuit breaker a reality.
First, we talked about states, so let's create a file called BreakerStates.ts
to define our states.
We're going to use an enum and color codes for the states, to make it a bit more developer-friendly.
In the BreakerStates.ts
file let's declare an enum like so:
Great, now that we have the states, what else do we need?
We'll need some configuration options for our Circuit Breaker that will answer the following questions for us:
- How many failures do we allow before moving to
RED
state? Let's call this ourfailureThreshold
- How many successes do we need before moving to
GREEN
state? Let's call this oursuccessThreshold
- Once we are in
RED
state, how much time should we wait before we allow a request to pass through? We'll call this ourtimeout
.
So, immediately, we can see that we'll need a public class named BreakerOptions
that can hold these properties. We could also opt for an interface trick here, but let's stick
to the conventional class-based approach.
Let's create a file called BreakerOptions.ts
and define our public class.
Once we have the States and Options defined, we can start planning the CircuitBreaker class implementation. Since the circuit breaker will be making requests, and we're using Axios as our HTTP library, we'll have Axios as our dependency for this class.
Let's think about the properties that we'll have in the class.
- request - the request property will contain details about the request that we are going to attempt.
Since we integrated with Axios, it would be smart to have this as the Axios request configuration.
We can use the
AxiosRequestConfig
type for that. - state - this property can hold our circuit breaker state.
We have a
BreakerState
type created for this. - failureCount - we will need something to count the number of failures with, let's use this property for that purpose.
- successCount - same as failureCount, but for tracking successes.
- nextAttempt - we will need a property to store a timestamp for the next time
when we try a request when we're in the
RED
state.
Let's not forget about the BreakerOptions
we defined!
We'll need to store those inside the class as well.
It would also be smart to make them optional and have default values defined for them within the class.
- failureThreshold - lets us know when to switch to
RED
state. - successThreshold - lets us know when to switch to
GREEN
state. - timeout - lets us know how much to wait before the next attempt (in milliseconds).
That is a handful of properties to be defined. So let's set all this up before we move to the logic implementation.
Let's create a file called CircuitBreaker.ts
where we'll define our CircuitBreaker class.
Now it's time to think about the methods that we'll need. Let's plan them out and then we can start implementing them one by one.
- log - we'll need a method to log the current state of the Circuit Breaker. We'll be able to use this same method to integrate with our monitoring system as well.
- exec - the execute method will be a public API through which we'll be able to trigger the request attempt. We'll need to make this into an asynchronous function because we'll be waiting for a server response.
- success - this method will handle the successful executions and return the upstream response.
- failure - this method will handle the failed attempts and return the upstream response.
So let's start at the beginning and define our log method as such:
All it's responsible for is taking the result and displaying it in a nice tabular format, including other details about the current state of our Circuit Breaker.
Let's move on to the success method and define some logic. Here's what it should do for us.
- Return the successful response.
- Reset the failure count.
- Log the status so that we're aware of what happened.
- If in
YELLOW
state, increment the success count — and if the success count is larger than the threshold defined, reset and move toGREEN
state.
Sounds easy enough, let's write the code!
Great, we have success down — we'll do the same for failure. Here's the gist of it.
- Return the response.
- Increment the failure count.
- Log the status so that we're aware of the failure.
- If the failure count exceeds the threshold, move to
RED
state, and define when our next attempt should take place.
Here's the code:
And finally, the most important method to define, the exec method! This stands at the core of our mechanism. Let's see what it should do for us.
- Most importantly, if the state is
RED
and the next attempt is scheduled sometime in the future, throw an Error and abort. We do not allow the request to go upstream. - If the state is
RED
but the timeout period expired, we want to switch state to YELLOW and allow the request to pass. - If the state is NOT
RED
we try to make the request, and based on whether the request succeeded or failed, we call the appropriate handler method.
Simple enough, right? Let's see how the implementation looks.
So, now that we have our CircuitBreaker
class all set up,
it's time to see how we can use it to perform requests.
Before anything else though, here's the complete implementation of the class, you can review it to see if it matches yours!
Looking good? Great!
Alongside our index.ts
file, we can create a test.ts
file as well, that will contain a
couple of lines of code for testing our masterpiece.
In the code above, we imported the CircuitBreaker,
created an instance of it and started calling the exec()
method at an interval of 1 second.
Let's add one more script to our package.json
file to be able to run this test conveniently.
The scripts section should look like this, updated with the test-breaker
script:
... "scripts": { "build": "tsc", "start-server": "npm run build && node build/index.js", "test-breaker": "npm run build && node build/test.js" }, ...
Now, let's make sure the server is running!
And in a separate terminal window, let's run the circuit breaker test as well.
Once executed, here's an example of the log stream that you should be seeing in your terminal.
Success! ┌───────────┬───────────────┐ │ (index) │ Values │ ├───────────┼───────────────┤ │ Result │ 'Failure' │ │ Timestamp │ 1592222319902 │ │ Successes │ 0 │ │ Failures │ 1 │ │ State │ 'GREEN' │ └───────────┴───────────────┘ Request failed with status code 400 ┌───────────┬───────────────┐ │ (index) │ Values │ ├───────────┼───────────────┤ │ Result │ 'Failure' │ │ Timestamp │ 1592222320906 │ │ Successes │ 0 │ │ Failures │ 2 │ │ State │ 'GREEN' │ └───────────┴───────────────┘ .............. ┌───────────┬───────────────┐ │ (index) │ Values │ ├───────────┼───────────────┤ │ Result │ 'Failure' │ │ Timestamp │ 1592222321904 │ │ Successes │ 0 │ │ Failures │ 3 │ │ State │ 'RED' │ └───────────┴───────────────┘ ............... ┌───────────┬───────────────┐ │ (index) │ Values │ ├───────────┼───────────────┤ │ Result │ 'Failure' │ │ Timestamp │ 1592222331941 │ │ Successes │ 2 │ │ Failures │ 1 │ │ State │ 'YELLOW' │ └───────────┴───────────────┘ ...............
From this point forward, you can have as much fun with it as you like.
You can start and stop the server while the circuit breaker is running to notice what happens,
and you can also create different breakers with different BreakerOptions
like so:
Implementation Granularity
Once you have it up and running, the design choices are in your hands. You can choose to make a circuit breaker responsible for an entire upstream service or just target individual endpoints depending on your needs.
Feel free to use different HTTP integrations, experiment with extending the breaker options and define multiple endpoints in your server to test with.
Here are additional feature ideas to consider:
- Create an API for the breaker so that it can be reset or tripped by the operations staff.
- Implement an event system around the Circuit Breaker so you can subscribe different parts of your application to it.
- Integrate the breaker with your favourite Node.js monitoring solution.
- Implement a Queue to automatically retry failed requests. (Warning: don't use this for requests downstream waiting for a response.)
- Implement Caching to serve failed requests from the cache.
Parting Words
This sums up our overview of the Circuit Breaker pattern! I hope this article helped you grasp a few resiliency principles and it sparked your imagination to try extending this boilerplate with some creative solutions.
We reinvented the wheel to understand how it works, but custom solutions are not always the best choice. You have to analyze complexity and keep maintenance overhead in sight.
Once you're comfortable with the basics, I would suggest you check out a few npm packages which are designed specifically for this purpose. There are a couple of candidates out there like opossum, hystrixJS, and brakes.
It all depends on your requirements and I trust you to make the right decisions in your journey to improve system resiliency!
P.S. If you liked this post, subscribe to our new JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.
P.P.S. We've touched upon monitoring in this post, so if you are now in the mood to try stuff out, go and check out AppSignal's APM for Node.js.