CPU-bound tasks can grind your Node.js applications to a standstill, frustrating users and driving them away. You must master the art of profiling your application to pinpoint bottlenecks, and implement effective strategies to resolve any issues.
In this guide, we'll explore various techniques to help you identify and fix the root causes of your Node.js performance issues.
Let's get started!
Prerequisites
Before proceeding with the article, ensure that you have the most recent LTS of
Node.js and npm
installed on your computer.
We'll use autocannon to run load tests on various server endpoints, so install it globally on your machine:
npm install -g autocannon
To follow along with the code samples, clone this repository and install the dependencies accordingly:
git clone https://github.com/damilolaolatunji/nodejs-profiling && cd nodejs-profiling npm install
Node.js Architecture: A Quick Overview
Node.js operates on an event-driven architecture, handling multiple client requests concurrently without creating separate threads for each connection. It achieves this by being non-blocking, allowing I/O operations like file operations and database queries to be processed asynchronously. When events occur, registered callbacks are executed, ensuring efficient handling of various tasks.
However, Node.js isn't suitable for CPU-bound tasks like heavy calculations or image processing. These tasks rely on the speed of a computer's CPU, and Node.js, with its single-threaded event loop, can only execute one JavaScript code at a time. As a result, CPU-bound tasks can block the main thread, causing significant performance problems, as we'll explore further in the next section.
Demonstrating a CPU-bound Task in Node.js
To illustrate a CPU-bound task, let's use the classic example of calculating a Fibonacci number from a specific position in the Fibonacci sequence.
This sequence starts with 0 and 1, where each subsequent number is the sum of
the two preceding ones, following the relation F(n) = F(n−1) + F(n−2)
with
seed values F(0) = 0, F(1) = 1
. Here are the first 20 Fibonacci numbers:
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181
Calculating the nth Fibonacci number using a recursive algorithm is straightforward:
// src/fibonacci.js function calcFiboRecursive(n) { if (n <= 1) return n; return calcFiboRecursive(n - 1) + calcFiboRecursive(n - 2); }
However, this algorithm's time complexity is O(2^n)
, indicating exponential
growth for the input size (n) (meaning that as n
increases, the
algorithm becomes significantly slower).
Bear in mind that there are much more efficient algorithms for calculating the Fibonacci number. This example only serves to illustrate the problem with CPU-bound tasks in Node.js.
Testing the CPU-bound Task
In the src/server.js
file, there's a /fibonacci-recursive/:n
route for
calculating the Fibonacci number at position n
:
// src/server.js . . . fastify.get('/fibonacci-recursive/:n', (request, reply) => { const n = parseInt(request.params.n, 10); const result = calcFiboRecursive(n); reply.send({ result }); }); . . .
You can test it by starting the server on port 3000:
npm start
Once the server has started, send a request to find the 30th Fibonacci number.
You can prepend the time
command to find out approximately how long the
request takes:
time curl http://localhost:3000/fibonacci-recursive/30
You should observe the result fairly quickly (on my laptop, it took 10 milliseconds):
{"result":832040} Executed in 10.34 millis fish external usr time 1.22 millis 235.00 micros 0.99 millis sys time 2.03 millis 52.00 micros 1.98 millis
When the number starts to increase, even only slightly, the response time grows by many orders of magnitude. On my machine, finding the 40th and 45th Fibonacci numbers took 810ms and 9s, respectively.
Impact on the Event Loop
Aside from the fact that the results take a progressively long time to compute, each request also completely blocks the event loop, preventing any other task from executing until its completion.
To establish baseline behavior, use autocannon
to send some load to the /
route. Ensure that the server isn't processing any other requests before
running the command below:
autocannon --renderStatusCodes http://localhost:3000
On my machine, I observed a throughput of almost 49k requests per second:
Running 10s test @ http://localhost:3000 10 connections ┌─────────┬──────┬──────┬───────┬──────┬─────────┬─────────┬───────┐ │ Stat │ 2.5% │ 50% │ 97.5% │ 99% │ Avg │ Stdev │ Max │ ├─────────┼──────┼──────┼───────┼──────┼─────────┼─────────┼───────┤ │ Latency │ 0 ms │ 0 ms │ 0 ms │ 0 ms │ 0.01 ms │ 0.06 ms │ 11 ms │ └─────────┴──────┴──────┴───────┴──────┴─────────┴─────────┴───────┘ ┌───────────┬─────────┬─────────┬─────────┬─────────┬──────────┬─────────┬─────────┐ │ Stat │ 1% │ 2.5% │ 50% │ 97.5% │ Avg │ Stdev │ Min │ ├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤ │ Req/Sec │ 41951 │ 41951 │ 49151 │ 50207 │ 48594.91 │ 2182.35 │ 41931 │ ├───────────┼─────────┼─────────┼─────────┼─────────┼──────────┼─────────┼─────────┤ │ Bytes/Sec │ 8.09 MB │ 8.09 MB │ 9.49 MB │ 9.69 MB │ 9.38 MB │ 422 kB │ 8.09 MB │ └───────────┴─────────┴─────────┴─────────┴─────────┴──────────┴─────────┴─────────┘ ┌──────┬────────┐ │ Code │ Count │ ├──────┼────────┤ │ 200 │ 534515 │ └──────┴────────┘ Req/Bytes counts sampled once per second. # of samples: 10 535k requests in 10.01s, 103 MB read
Now, block the event loop by calculating the 50th Fibonacci number as follows. This request will take a long time to complete (about 123 seconds on my machine):
curl http://localhost:3000/fibonacci-recursive/50
While the result is computing (and the event loop is blocked), in a different
terminal, use autocannon
to send load to the /
once again:
autocannon --renderStatusCodes http://localhost:3000
You will observe that the throughput drops dramatically to zero requests per second due to the ongoing Fibonacci computation. This clearly demonstrates how devastating blocking the event loop can be to Node.js performance.
. . . ┌──────┬───────┐ │ Code │ Count │ └──────┴───────┘ Req/Bytes counts sampled once per second. # of samples: 10 20 requests in 10.02s, 0 B read 10 errors (10 timeouts)
Profiling your Node.js Application for Bottlenecks
When your application's performance starts to lag, pinpointing the exact cause of slowdowns can be a challenge. To effectively troubleshoot such issues, the first step is to profile your code and identify bottlenecks.
In this section, we'll explore three main approaches for profiling Node.js applications:
- Using the built-in Node.js profiler
- Using Chrome DevTools
- Using the Inspector module for on-demand profiling in production
1. Using the Built-in Node.js Profiler
The first approach we'll consider requires no external tools or libraries. It
involves using the built-in sample-based profiler built into Node.js, which is
enabled through the --prof
command-line option. Once the profiler is turned
on, it records stacks of both JavaScript and C/C++ code into a file in the
current working directory.
First, enable the profiler using the --prof
command-line option when starting
your Node.js server:
node --prof src/server.js
Once the server is running, simulate traffic to the /fibonacci-recursive/:n
route through autocannon
as shown below:
autocannon --renderStatusCodes http://localhost:3000/fibonacci-recursive/30
On my machine, the server processed an average of 138.7 requests per second:
. . . ┌──────┬───────┐ │ Code │ Count │ ├──────┼───────┤ │ 200 │ 1387 │ └──────┴───────┘ Req/Bytes counts sampled once per second. # of samples: 10 1k requests in 10.01s, 261 kB read
The profiler records data into a file in your project directory. You will notice
a new tick log file named isolate-0x<number>-v8.log
. This file is not meant to
be consumed by humans, but must be processed further to generate
human-readable data.
To process this log data, use the built-in Node.js tick processor:
node --prof-process isolate-0x<number>-v8.log > processed.txt
This creates a new processed.txt
file with human-readable profiling data in
your project root. You can view its content by opening it up in your text
editor. The file is broken up into several sections, each with its own table.
Let's delve into the processed data to identify bottlenecks, starting with the
[Summary]
section:
// processed.txt [Summary]: ticks total nonlib name 8603 84.0% 84.0% JavaScript 191 1.9% 1.9% C++ 28 0.3% 0.3% GC 7 0.1% Shared libraries 1442 14.1% Unaccounted
This summary indicates that approximately 84% of the samples occurred in JavaScript code, directing your attention to the JavaScript functions for investigation.
With this in mind, navigate to the [Javascript]
section of the file to see
which JavaScript functions are taking up the most time:
// processed.txt [JavaScript]: ticks total nonlib name 8542 83.4% 83.5% JS: *calcFiboRecursive file:///home/ayo/dev/demo/nodejs-profiling/src/fibonacci.js:1:27 9 0.1% 0.1% JS: *resolve node:path:1095:10 4 0.0% 0.0% JS: *readPackageScope node:internal/modules/cjs/loader:377:26 3 0.0% 0.0% JS: *clearIncoming node:_http_server:948:23 2 0.0% 0.0% JS: ^defaultGenReqId /home/ayo/dev/demo/nodejs-profiling/node_modules/fastify/lib/reqIdGenFactory.js:23:28 . . .
In a real-world application, you might do a lot of processing in several functions
to handle user requests. But here, the calcFiboRecursive()
function is
(predictably) where most time is spent, making it the clear target for
optimization.
2. Using Chrome DevTools
A second approach to profiling your Node.js applications is to use Chrome
DevTools to collect performance data. You need to start your server using the
--inspect
flag to get started:
node --inspect src/server.js
Once the server is running, open Google Chrome or any other Chromium-based browser,
and type chrome://inspect
in the address bar.
Find the inspect link for your Node.js script and click on it to bring up a new DevTools window for debugging your Node.js app. Head over to the Performance tab and click on the record button to start profiling your code.
Head back to your terminal, and run autocannon
against the
/fibonacci-recursive/:n
route once more:
autocannon --renderStatusCodes http://localhost:3000/fibonacci-recursive/30
Once the command finishes, return to your DevTools window and click the Stop button to finish the profiling. You will immediately observe a detailed performance profile of your program:
The Main section displays a flame chart representing main thread activity in your program. The horizontal axis represents time, while the vertical axis represents the call stack. You can zoom in to get a more detailed view:
In this view, the events on top cause the events below them, and the width of each event
indicates its duration (the wider it is, the longer the event runs). The frames under
parserOnHeadersComplete
encompass routing logic, parameter processing, and,
finally, lead to the request handler ((anonymous)
). This handler calls the
calcFiboRecursive()
function multiple times due to its recursive nature,
clearly visible in the graph.
Clicking the (anonymous)
frame and switching to the Bottom-Up tab below
lists the various resulting function calls and orders them according to their
duration:
3. Using the Inspector Module for On-Demand Profiling in Production
The two profiling approaches above are useful ways to profile your code in a development environment. However, since performance issues are often first observed in production, it's crucial to set up on-demand profiling in a way that won't affect runtime performance.
The approach we'll discuss here combines the strengths of both methods, using built-in Node.js APIs to capture profiling data and then analyzing the data using Chrome DevTools.
The Inspector API provides the ability to interact with the V8 inspector programmatically. You can use it to collect profiling data from your application like this:
// src/inspector.js import * as inspector from "node:inspector/promises"; import fs from "node:fs"; const session = new inspector.Session(); async function startCPUProfile() { try { session.connect(); await session.post("Profiler.enable"); await session.post("Profiler.start"); } catch (err) { console.error(err); } } async function stopCPUProfile() { try { const { profile } = await session.post("Profiler.stop"); fs.writeFileSync("./profile.cpuprofile", JSON.stringify(profile)); session.disconnect(); } catch (err) { console.error(err); } } process.on("SIGUSR1", startCPUProfile); process.on("SIGUSR2", stopCPUProfile);
This file sets up an on-demand CPU profiling mechanism for the Node.js application through the following two functions:
startCPUProfile()
connects to the inspector session and enables the CPU profiler by sending appropriate commands using the inspector protocol.stopCPUProfile()
stops the CPU profiler, retrieves the profiling data, and writes it to a file namedprofile.cpuprofile
.
The program is also set up to listen for the SIGUSR1
and SIGUSR2
signals.
The former triggers the collection of the CPU profile, while the latter stops
CPU profiling and saves the profiling data to a file for analysis.
You can restart your server with npm start
, then trigger CPU profiling by
sending the SIGUSR1
signal to the process through the kill
command. Ensure
you replace <process_id>
with the pid
value found in the server startup log:
kill -SIGUSR1 <process_id>
If the above command exits successfully, it indicates that profiling has
started. You can now use autocannon
to send some load to the
/fibonacci-recursive/:n
route as before. Once that's done, send the SIGUSR2
signal to stop the profiling and write the data to a file:
kill -SIGUSR2 <process_id>
You should see a new profile.cpuprofile
file in your project directory
which contains the collected profiling information. To analyze the data, open
Chrome DevTools and click the Load profile button in the Performance tab:
Once the file is loaded, you will see the flame graph as before. Analyze it to identify performance bottlenecks, just as described in the previous section.
With this approach, you can trigger CPU profiling on demand, in real-time, without interrupting server operations. In a production environment, consider setting up a different mechanism (e.g., an HTTP endpoint) to trigger profiling and automatically upload the profile data to a secure location for convenient retrieval and analysis.
Wrapping Up
In this article, we explored the challenges associated with managing CPU-bound tasks in Node.js and their substantial influence on runtime efficiency. You also learned several methods to profile your code and identify performance bottlenecks.
In the upcoming and concluding second part, our focus will shift to specific techniques you can employ to enhance the performance of CPU-bound tasks in Node.js.
Until next time, thanks for reading!
P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.
P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.