Ecto can be considered the standard database wrapper and query generator for Elixir, enabling developers to interact with databases efficiently.
However, inefficient queries or misconfigurations still can (and will) happen. Addressing these issues is crucial for a smooth user experience.
In this article, we'll explore three common performance issues in Ecto applications:
- N+1 query problem: You get excessive redundant queries from retrieving related data.
- Inefficient query execution: Poorly designed queries will strain a database.
- Connection pooling and concurrency issues: Bottlenecks and slow response times are caused by configuration or concurrency problems.
We'll discuss how to detect each issue, as well as their common causes and solutions, so you can optimize your Elixir applications for peak performance.
Let's start!
The N+1 Query Problem
The N+1 query problem occurs when an application loads a parent record and its associated child records in separate queries.
This leads to one query for the parent record (1) and one query for each child record (N), resulting in N+1 queries in total. This issue can cause a significant performance hit due to an excessive number of redundant queries.
Detecting the Problem in Elixir
Suppose we have a User
schema on a many-to-many relationship with a Role
schema. We'll call the schema that associates the two RoleAssignment
:
Imagine that we want to show all the users and their roles on an admin page:
As you can see here, we generate N queries to load the associated roles for every user.
To detect the N+1 query problem, you can use tools like Telemetry. Telemetry monitors query counts and identifies queries that are executed multiple times with slight variations.
For example, you wrap the query into a
span
and then attach
a handler to the events to detect it:
We receive two events for every function call: one when the span starts and the other when the span has finished.
With the spans in place, we can now attach a handler to listen for any call to them:
New Phoenix applications ship with a telemetry supervisor under
<app_web>/telemetry.ex
. We can add the handler to its init/1
function:
Here are a couple of strategies to detect N+1:
- Count the total number of queries within a span context to analyze anomalies or spikes.
- Create a metric, publish an event from the span handler, and, from there, plot data into a dashboard.
It is not an easy problem to spot, but when your application traffic starts to increase, you'll see some symptoms. Here are some additional things you can look for that may indicate you're experiencing an N+1 problem:
- High CPU usage on your database systems
- All the pages of your web app are fast except for a few
- When you open them, lots of the same SQL is being executed
In general, the more instrumentation you have in place, the easier it is to detect N+1 queries.
For more information on N+1 queries and how to use AppSignal to detect them, read our post Performance and N+1 Queries: Explained, Spotted, and Solved.
Using Ecto's Preload
Ecto provides the preload/3
function to load associated records in a single query, avoiding the N+1 query
problem.
Here's an example of how to use preload/3
. First, add the proper relationship to the model:
Now, to fetch all users along with their roles, you can use preload/3
like this:
This will load the user
in one query, and roles
in another query,
regardless of the number of posts a user might have. Great — we go from
N+1 to 2!
As an example, an N+1 query with 200 users and 200 roles takes 20 seconds to load on my computer. Using the preload, this number is reduced to 600 milliseconds!
But there is room for even more optimization.
Preloading with Joins
Preloading with joins has the advantage of generating only one query to load all associated records from a database.
In addition, you can also
filter or sort the associated records! Just use
join/5
along with
preload/3
.
Following the same example from the last section:
This fetches users and their assigned roles using a single query, avoiding the N+1 query problem. On my computer, this takes about 100 milliseconds to run.
By leveraging Ecto's preload/3
function and combining it with join/5
when
necessary, you can efficiently load associated records and eliminate the N+1
query problem in your Ecto-based applications.
Inefficient Queries in Ecto
Inefficient query execution can result in slow database performance, as poorly designed queries may place unnecessary strain on the database. Now we'll learn how to detect and fix them.
Example 1: Missing Indexes
One of the most common causes of inefficient queries is a lack of indexes.
Suppose we have a Post
schema like this one:
Let's fetch all published posts ordered by their publication date:
If there is no index on the published_at
field, this query may become slow
when the number of posts in the database grows. The database
system will perform a full scan of the table for every query.
To fix this issue, you can add an index to the relevant column using a migration:
There's really no easy way to detect this from the application — it will only start to be noticeable with queries on large tables. From the point of view of an application, it's only a slow query.
In the next section, we'll learn how to monitor the query execution time with
telemetry
to detect such problems.
Detecting Bad Queries with PostgreSQL and Telemetry
You can use tools like EXPLAIN ANALYZE
in
PostgreSQL (or similar features in other databases) to detect inefficient queries by analyzing the execution plan
and identifying possible bottlenecks.
Additionally, you can use telemetry to monitor query execution times and set up alerts if a query takes too long to execute.
For example, you might add a handler to alert you if a query takes longer than 5 seconds:
Now we can attach the handler during the telemetry supervisor startup, just like we did for the span handlers:
Query timeouts can also serve as a bad smell, indicating that inefficient query execution is occurring.
Ecto Connection Pooling and Concurrency Issues
Connection pooling problems can occur when an application tries to open more connections to a database than the pool allows. This can lead to a bottleneck, as processes are left waiting in a queue for an available connection.
Once again, telemetry can help us identify this problem. Ecto already ships with Telemetry support and emits several events that we can listen to and react to. The documentation has data and metadata that Ecto emits for all queries, including queue_time.
We can modify the previous handler example and also monitor long query waiting times in the connection pool queue:
Of course, you can always increase the number of connections in an application's configuration:
With the config above, you can control the pool size using the POOL_SIZE
env
var or the default 10 connections size.
PostgreSQL Deadlocks
Locks are a fundamental mechanism that databases use to ensure data integrity. Still, they can struggle when there's a high volume of updates from different sources in the same timeframe.
A deadlock occurs when two or more processes wait for each other to release a resource, causing all processes to be stuck indefinitely.
How Deadlocks Occur
Deadlocks can happen when two or more processes try to acquire locks on multiple resources in an inconsistent order.
Consider the following schemas:
Now imagine the following scenario using post and reaction schemas:
- Process A acquires a lock on Post X.
- Process B acquires a lock on Post Y.
- Process A tries to acquire a lock on Post Y (but is blocked because Process B holds the lock).
- Process B tries to acquire a lock on Post X (but is blocked because Process A holds the lock).
In this case, both processes wait for each other to release the locks, resulting in a deadlock.
If two processes try to add reactions to different posts and update reaction counts at the same time, we could potentially run into a deadlock:
Most database systems provide mechanisms to detect and automatically resolve deadlocks by rolling back one of the transactions involved.
To prevent deadlocks, you can:
- Acquire locks in a consistent order across all processes. For example, you can enforce ordering by post ID:
- Use timeouts when acquiring locks to prevent indefinite waits. In Ecto, you can use the
:timeout
option with theRepo.transaction/2
function:
By setting a timeout, a transaction will be rolled back if it cannot acquire the necessary locks within the specified time (preventing deadlocks from causing an application to hang indefinitely).
Final Thoughts
From the application point of view, there is only so much that we can actually monitor, since only a handful of entities can evaluate application context like query execution time, queue time, memory consumption, etc.
The best way to know what is happening to your whole stack is to instrument your application with tools like Telemetry or OpenTelemetry, then connect it to your favorite monitoring tool, like AppSignal!
Read more about instrumenting AppSignal for Ecto.
Wrap Up
In this article, we explored common performance issues in Elixir applications using Ecto and provided insights on detecting and addressing these problems.
We started with the N+1 query problem, learning to identify and solve it using Ecto's preload functionality. Then we investigated inefficient query execution, discussing how to optimize queries and use Telemetry for monitoring.
Lastly, we covered connection pooling and concurrency issues, emphasizing the importance of proper configuration, monitoring, and techniques for avoiding deadlocks.
Happy coding!
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!