elixir

Debugging Slow Ecto Queries with AppSignal

Samuel Mullen

Samuel Mullen on

Debugging Slow Ecto Queries with AppSignal

A sports car can only be driven as fast as the road it's driven on. If you're stuck behind a tractor on a single-lane road, you're not going anywhere fast. The same idea applies to web performance: your application's throughput is only as fast as it's slowest bottleneck.

For Phoenix applications, that bottleneck is almost always the database. Ecto provides powerful tools that allow us to compose large, readable queries by chaining functions together, but this readability can come at the cost of expensive queries under the hood. Slow queries, unindexed tables, and N+1 requests affect your app's performance.

Finding and resolving these issues can take a lot of work, but as the title suggests, AppSignal can be the roadside assistance you need to get your app back on the road. We'll first look at how AppSignal detects poorly performing queries, then look at understanding why those queries are slow, and then finally fixing the problems.

Finding the Slow Query in AppSignal

It's one thing to know your app has a traffic jam, and quite another to know where that traffic jam exists. Because AppSignal auto-instruments Ecto queries, it's able to both pinpoint where in your app that bottleneck is occurring as well as point out the problematic query itself.

To demonstrate how AppSignal helps identify bottlenecks, we’ll use an example app that mimics a basic social media site, with users, followers and posts. When visiting a user's profile page (/users/:username) the app displays counts of followers and those they follow, who they follow and who follows them, and finally a list of posts from those they follow.

The example app and code used in this article can be found on the GitHub Social app repo. Basic instructions for installing AppSignal in Elixir apps, and a more complete guide are also available.

Social Screenshot
Social app profile page

Although the page appears to be functioning correctly, AppSignal has alerted us to a performance issue that hasn't yet risen to the point of user complaints, and it's done so in three different places under the Performance navigation section:

  • Issue list
  • Actions
  • Slow queries

Issue list, as the name implies, lists issues your app is currently experiencing. In our social app, we can see that our /users/:username endpoint is already listed.

Issues list
AppSignal issues list

The Actions tab provides a list of routes along with their duration (mean and P90) and throughput. Using this page, you can sort routes based on their duration, and then click into the laggard routes to find more information. Clicking into our /users/:username route reveals that there is an open "Performance incident."

Performance incident
AppSignal performance incident

Lastly, we can see which queries are causing the most problems under the Slow queries tab. By default, AppSignal ranks queries by their impact (i.e. frequency * duration) to performance from greatest to least. You'll note that what is listed is the SQL queries themselves, not the Ecto functions or from where in the app they originate.

Query impact
AppSignal slow queries tab ranked by impact

Understanding Why It's Slow

It's not always easy to track performance issues, but the Samples tool in AppSignal simplifies the process by mapping out each operation in the Timeline view. The visualization it provides enables you to pinpoint the specific Ecto calls responsible for bottlenecks.

You can get to this page by first navigating to the Issue list under the Performance navigation section, and clicking on an "action" you want to explore. From there, click on the specific sample you're interested in from the Samples section.

Once you're on the Samples page, scroll down to the Timeline section. You should see something like this:

AppSignal timeline
AppSignal timeline view

What's most noticeable in the timeline is:

  1. Three queries are greater than or equal to 1 ms
  2. One query takes up the majority of the time
  3. There are six queries, but the code (not shown) only calls five functions (indicating a potential N+1 query)

We can get more information on each of the queries by hovering over any of the events. If it's a performance issue, we can then use the explain function common in most relational databases to investigate what's happening.

Samples with SQL
AppSignal timeline with SQL query details

Running explain analyze on the SQL shown above results in the "Query Plan" shown below:

Shell
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Sort (cost=106.72..107.47 rows=300 width=127) (actual time=2.782..2.894 rows=1500 loops=1) Sort Key: p0.inserted_at DESC Sort Method: quicksort Memory: 275kB -> Hash Join (cost=2.17..94.37 rows=300 width=127) (actual time=0.496..1.873 rows=1500 loops=1) Hash Cond: (f1.followed_id = u2.id) -> Hash Join (cost=1.07..91.66 rows=300 width=91) (actual time=0.164..1.102 rows=1500 loops=1) Hash Cond: (p0.user_id = f1.followed_id) -> Seq Scan on posts p0 (cost=0.00..82.00 rows=2000 width=83) (actual time=0.105..0.504 rows=2000 loops=1) -> Hash (cost=1.06..1.06 rows=1 width=8) (actual time=0.027..0.028 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on followers f1 (cost=0.00..1.06 rows=1 width=8) (actual time=0.019..0.020 rows=3 loops=1) Filter: (follower_id = 30) Rows Removed by Filter: 3 -> Hash (cost=1.04..1.04 rows=4 width=44) (actual time=0.308..0.309 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on users u2 (cost=0.00..1.04 rows=4 width=44) (actual time=0.012..0.013 rows=4 loops=1) Planning Time: 0.315 ms Execution Time: 3.360 ms

Two things stand out in this plan:

  1. We're scanning many more records than necessary (Seq Scan on posts...rows=2000).
  2. The estimated rows (300) vs the actual rows (1,500) are significantly different which can lead to the database choosing the wrong strategy.

A potential third item might be the Filter.

Let's see what we can do to improve the performance.

Common Ecto Fixes

In his article, Tackling Performance Issues in Ecto Applications, Marcos Ramos addresses several issues affecting performance. The two we're most interested in are "Inefficient Queries in Ecto" and "The N+1 Query Problem".

Inefficient Queries in Ecto

Not all performance wins require the same level of effort. When tackling inefficiencies in Ecto queries, consider the following possible optimizations:

  • Ensure the tables are indexed correctly
  • Only ask for what you need
  • Offload the heavy lifting to views or processes that are better able to handle the work

Validating tables are properly indexed is a great place to start. Adding indexes which should have existed from the beginning (e.g. on foreign keys, sortable fields, and commonly filtered fields) will always bring an improvement to performance. In our example, indexes on posts.user_id,followers.follower_id, followers.followed_id, and posts.inserted_at have all been erroneously forgotten.

The fix is simply creating those indexes with an Ecto migration:

Elixir
defmodule Social.Repo.Migrations.AddMissingIndexes do use Ecto.Migration def change do create index(:posts, [:user_id]) create index(:posts, [:inserted_at]) create index(:followers, [:follower_id]) create index(:followers, [:followed_id]) create index(:followers, [:inserted_at]) end end

Note: You would normally create a migration for each index or at least group them by affected table.

After making sure your tables are properly indexed, another great tip is to check for “unbound queries”. As the name suggests, unbound queries are those queries which have no constraints. Examples might include queries that don't limit the number of returned rows, aren't constrained by a where/3 clause, or don't use select/3 to limit the fields returned.

As we saw above, our query plan showed that we were working with 2,000 results. We don't need to see that many posts at once, so it makes sense to limit the number of posts per "page". To that end, we'll add both limit/3 and offset/3 to our query below.

Elixir
def list_feed_posts(user_id, page \\\\ 1, page_size \\\\ 20) do from(p in Post, join: f in Social.Users.Follower, on: f.follower_id == ^user_id and f.followed_id == p.user_id, join: u in assoc(p, :user), order_by: [desc: p.inserted_at], preload: [user: u], limit: ^page_size, offset: ^page_size * (^page - 1) ) |> Repo.all() end

If you are still seeing performance issues after indexing tables and constraining the data returned, you may need to consider moving your more complex queries into materialized views or to a background job. Aggregated stats, dashboard metrics, multi-join filtering, long-running reports, and data-exports are all examples of queries or processes which can take advantage of these types of enhancements.

The N+1 Query Problem

As Marcos explains in his article, N+1 queries occur "when an application loads a parent record and its associated child records in separate queries." In our app that might be by first pulling a list of people followed and then retrieving their posts in separate queries. The 1 represents the initial query, while the N represents each consecutive query.

In our example app, what originally looked like an N+1 query was the result of a single function issuing two count queries, and doesn't need improvement:

Elixir
def get_user_follow_counts(user_id) do follower_count = from(f in Follower, where: f.followed_id == ^user_id, select: count(f.id) ) |> Repo.one() following_count = from(f in Follower, where: f.follower_id == ^user_id, select: count(f.id) ) |> Repo.one() %{followers: follower_count, following: following_count} end

If you want to understand N+1 queries more, Sapan Diwakar has a great tutorial on how to Find and Fix N+1 Queries Using AppSignal.

Verifying Our Changes Worked

Once you have found and eliminated the speed bumps slowing you down, you'll want to verify that your changes actually fixed the problems. You'll first want to verify your changes locally, but once you've deployed, make sure to review your project's AppSignal page to confirm that your changes are having the effect you're expecting. You can do that by navigating to the route from the Actions page.

Note the Deploy markers below the timeline. These are trivial to set up and will help you identify when changes took effect, and which commit.

Deploy markers in performance timeline
AppSignal deploy markers in performance timeline

You can even drill into the latest Samples to see how your changes affected specific queries against production data.

Wrap-Up

Sports cars are loads of fun to drive on long straightaways, around tight curves, and sometimes even on unpoliced country roads. They're less fun to drive in bad weather or through construction areas. It's a similar experience for your app's users. Speed matters and if your app isn't responsive enough, your users may decide to get off at the next exit.

To avoid that, we first looked at how to use AppSignal to detect and find slow queries, primarily focused on the Performance navigation section and the tools under it: Issue list, Actions, and Slow queries. Next, we used the Timeline view within an individual "sample" to isolate potential causes. This gave us a closer look at the request lifecycle, exposing N+1 queries and the underlying SQL responsible for the lag. Finally, we reviewed common fixes (adding indexes, constraining queries, etc.) and how to verify our fixes worked.

Phoenix gives you speed by default. Don't let a bad query or a poorly indexed table turn your sports car into an old clunker.

Wondering what you can do next?

Finished this article? Here are a few more things you can do:

  • Share this article on social media
Samuel Mullen

Samuel Mullen

Engineering leader with over 20 years of success delivering SaaS products, scaling complex APIs, integrations, and architectures, and building high-performance, low-churn teams. Skilled at fostering cross-functional alignment, guiding initiatives from concept to deployment, and rallying stakeholders around long-term strategies.

All articles by Samuel Mullen

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps