javascript

Monitoring Your Node.js App Health on Fly.io

Tarun Singh

Tarun Singh on

Monitoring Your Node.js App Health on Fly.io

The Node.js service has just been containerized and deployed with a single fly deploy command across continents. Everything seems to be alright, but then a week later, a user messages you saying the app is slow.

You run the fly logs command and scroll through some logs, and find nothing out of the ordinary. The Fly.io dashboard says the app is running and healthy, but something behind the scenes is slowing down the app, and you have no idea what. You don’t even know where to start.

It’s a common situation that clearly points to a real gap. Sure, Fly.io is generally good at telling you something is wrong, but it can’t really pinpoint why.

Enter AppSignal.

In this article, you’ll see how you can move beyond fundamental machine signals and get deep, actionable visibility into your Node.js app on Fly.io using AppSignal.

What Fly.io Gives You (and What It Doesn’t)

Fly.io is among the best deployment tools out there. It can take your app to production within minutes, and it features useful out-of-the-box observability facilities that help you learn more about your app's functioning through logs and metrics.

Logging (fly logs) streams your app's output directly from the Fly.io CLI in real time, including both stdout and stderr, which are great for seeing the instant app crashes or deployment errors. This is exactly the kind of feedback you need during quick sanity checks.

With Fly.io, you can get deeper-level metrics directly on the platform. It pulls machine-level stats like CPU usage, I/O, network throughput, and memory via Prometheus. You can explore the metrics further through Fly.io’s preconfigured Grafana instance, or expose custom metrics from your app by adding a [metrics] block in the fly.toml that points to a /metrics endpoint.

This is a solid infrastructure layer; however, neither of these tools can tell you which endpoint is slow, what the actual error was before recovery, or which database query is causing the bottleneck. Basically, they can confirm what is wrong with your app but not why.

To find the root cause of this problem, you need to take a peek inside the process.

What AppSignal Captures in Your Node.js App

AppSignal auto-instruments Node.js, Express, Fastify, Koa, and Next.js API routes; no manual wrapping of routes or handlers is needed. It tracks request times for each route, database queries, external API calls, unhandled errors, and even server metrics like CPU and memory usage. And the best part is that all this is done through a single AppSignal agent.

All this data shows up together, in one place, so you can see how your app is performing, what errors are occurring, and how the server is handling the load on the same timeline. This context makes it much easier to figure out what has gone wrong.

The Node.js setup docs explain the installation steps in detail.

Integrating AppSignal

The integration of AppSignal is quite simple:

Shell
npm install @appsignal/nodejs

Initialize the package at the very top of your app.js (or whatever your main file is) before any other imports. This is important because AppSignal hooks into your dependencies at load time, and anything required before it won’t be instrumented.

Quick tip: Never hardcode your API key in your Dockerfile or anywhere else. Instead, use Fly’s secret management. It provides a secure way to store sensitive information by injecting it as environment variables at runtime rather than baking it into your Docker image:

Shell
fly secrets set APPSIGNAL_PUSH_API_KEY=your_key_here fly deploy

The fly deploy command restarts your machine with the new configuration. Within minutes, you should start seeing the app send telemetry data to AppSignal.

The Signals AppSignal Surfaces on Fly.io

Generic monitoring tips often don’t cover how Fly.io runs or how Node.js tends to break in production. Here are some important insights AppSignal shows you:

Request Performance

The performance dashboard shows response time by route rather than a single average, so you can see exactly where time is being spent on each request.

Below, you’ll find the AppSignal performance dashboard for an actual Node.js application deployed on Fly.io.

Response time spike flagged in AppSignal
Response time spike flagged in AppSignal

The throughput graph shows real-life production traffic, ranging from 10 to 50 requests processed per minute. The response time graph, in particular, tells a clear story: an orange spike at around 09:07 climbed to nearly 5 seconds. However, no Fly.io alert fired, and machine metrics looked normal throughout.

Logging

AppSignal collects your logs in one place and ties them to the request or error that has produced them. Instead of digging through debug or verbose output, you can filter them by severity or source and focus on what truly matters: errors and important flows.

AppSignal log view with source and severity filter
AppSignal log view with source and severity filter

Error Groups

AppSignal allows you to group errors by their specific type, regardless of how many servers you have. This is particularly useful when working with Fly.io because routing traffic between machines and regions means that the error may show up in only one of those machines. Without AppSignal, you’d have to grep through logs from all your distributed systems just to spot the likely cause.

ECONNREFUSED error with full backtrace
ECONNREFUSED error with full backtrace

A good example here is the ECONNREFUSED error that you can see in the dashboard, caught on request GET /api/drivers/compare. AppSignal groups it and timestamps it automatically.

Host metrics

Host metrics, especially memory, are another important signal. Node.js dynamically sets the V8 heap limit based on available memory, typically around 1.5GB on 64-bit systems. Fly.io machines have a hard limit above that. If both are close to their limits when a traffic spike hits, Fly.io’s out-of-memory (OOM) killer restarts the machine with no warning or stack trace.

By monitoring heap usage and machine memory on the same screen, AppSignal lets you visualize them both in the same view, so you can catch potential issues before Fly.io’s OOM killer intervenes.

Fly-Specific Things to Watch

Beyond general monitoring, there are a few patterns specific to how Fly.io runs Node.js apps. Knowing what they look like in AppSignal can save you from chasing false alarms.

  • Machine restarts: Fly.io does more aggressive machine recycling than traditional hosts, so if you're experiencing occasional restarts and corresponding error spikes in AppSignal, this indicates a crash loop rather than a graceful machine recycle. The pattern should be easy to spot based on AppSignal’s error grouping.
  • Cold starts: The first request sent to a given machine on Fly.io after it has been sitting at scale 0 will take several seconds or more to respond. If you do not know what a cold start looks like in your performance data, you might mistakenly interpret it as a regression.
  • Shared CPU variance:  On shared-CPU Fly.io machines, erratic p95 response time alongside stable p50 likely indicates noisy-neighbor CPU contention, not a problem with your code. AppSignal’s route-level breakdown makes this obvious: all routes spike together with nothing in the trace pointing to a specific query or external call. This rules out your code immediately.

Connecting Deploys to Changes

The most useful habit you can build after setting up AppSignal is adding a deploy marker to your fly deploy step. One line in your deploy script sends a marker to AppSignal, which draws a vertical line on every graph at the exact moment your deployment lands.

Response time spike with deploy marker
Response time spike with deploy marker

Whenever there is a spike in errors or response time, usually the first question you ask is if it is related to a deployment or not. By using markers, you don’t have to go through the process of recreating the timeline from memory or validating which occurred first. You can see the before and after immediately.

Alerts Worth Setting

Once everything is visible, the next step is making sure you’re notified when something changes. A few alerts are particularly useful:

  1. Error rate: Create a trigger for new or increasing error types, particularly after a deployment. If a new exception appears, you’ll want to know immediately.
  2. Slow routes: Set response time thresholds on your most important endpoints. Adding a short warm-up window helps avoid alerts from a single temporary slowdown.
  3. Memory threshold: Set this below Fly.io’s hard ceiling so you have time to act before an OOM restart occurs.
AppSignal anomaly detection trigger setup
AppSignal anomaly detection trigger setup

All these are easily configured in the Anomaly Detection section in your dashboard. You won’t be required to write any config files or alerting rules.

Uptime Monitoring

Uptime monitor showing region-based outage
Uptime monitor showing region-based outage

AppSignal’s uptime monitoring alerts you the moment your application stops responding in any region across the world. The public URL you’ve set to monitor is pinged every 30 seconds from multiple regions. If it doesn’t respond within that duration, the app is considered to be down.

Wrapping it Up

Fly.io is pretty good at running your machines and deploying your app globally, but if you want to keep a Node.js app healthy in production, you’ll need to know more than whether a machine is up. For instance, you need to see how requests behave, which errors are being repeated, and whether memory or performance trends are moving in the wrong direction.

With AppSignal, you can peek under the hood and see request performance, grouped errors, logs, host metrics, and deploy markers all in one place. When something slows down or breaks, you’ll quickly understand what actually feels off.

If you’re deploying Node.js on Fly.io, it pays to get a clear view of what’s happening inside your application. The AppSignal docs are a great place to start, and you can use a free plan to test it and explore the dashboard with real data, no credit card required.

Wondering what you can do next?

Finished this article? Here are a few more things you can do:

  • Share this article on social media
Tarun Singh

Tarun Singh

Tarun Singh is a software engineer and technical writer with 5+ years of experience creating developer-focused content on backend systems, APIs, and modern web development. He has published 800+ technical articles across major platforms and frequently writes deep-dive tutorials on developer tools, testing, AI, agentic tools, cloud, and infrastructure. Tarun is passionate about open source, developer education, and building reliable software systems.

All articles by Tarun Singh

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps