Under the Hood of Ecto

Sapan Diwakar

Sapan Diwakar on

Under the Hood of Ecto

Ecto is a toolkit for mapping database objects to Elixir structs and provides a unified interface to manipulate that data.

In this post, we will dive into the internals of Ecto — its major components, their functions, and how they work. In doing so, we'll demystify some of the apparent magic behind Ecto.

Let's get going!

Ecto's Modules

Ecto is made up of four major modules — Repo, Query, Schema, and Changeset.

We'll look at each in turn. Let’s start with the Repo module.

Repo Module

If you use Ecto with a database (like most users out there), Repo is the heart of Ecto. It binds everything together and provides a centralized point of communication between a database and your application. Repo:

  • maintains connections
  • executes queries against a database
  • provides an API to write migrations that interact with the database

Let's get started with Repo. Simply call use Ecto.Repo inside your Repo module. If you use mix phx.new to generate your Elixir project, this is done automatically for you.

# lib/my_app/repo.ex defmodule MyApp.Repo use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.Postgres

These few lines of code define the repo. Putting it under the Supervision tree inside application.ex gives you access to a whole set of functions provided by Repo to interact with a database. Again, this is code that is generated for you when using Phoenix:

defmodule MyApp.Application do use Application @impl true def start(_type, _args) do children = [ # Start the Ecto repository MyApp.Repo, # Other Children... ] # See https://hexdocs.pm/elixir/Supervisor.html # for other strategies and supported options opts = [strategy: :one_for_one, name: MyApp.Supervisor] Supervisor.start_link(children, opts) end

With the few lines of code above, you get the following:

  • Access to the full Ecto.Repo API included in MyApp.Repo. The most common use cases include fetching records with MyApp.Repo.all/2, inserting new records with MyApp.Repo.insert/2, and updating records with MyApp.Repo.update/2.
  • A Supervisor starts that keeps track of all the processes required to keep Ecto working. The Supervision tree initializes the adapter (Ecto.Adapters.Postgres, in this case), which is responsible for all communication with the database. The Postgres adapter, in turn, starts a connection pool to your database using the DBConnection library.
  • A query planner starts that's responsible for planning and normalizing a query and its parameters. It also keeps a cache of all planned queries in an ETS table. We will learn more about this when we get to the Query module.

Monitor Queries Sent to Ecto from Your Elixir Application

In addition, Ecto also automatically publishes telemetry events that can be monitored. For example, to monitor statistics for all the queries sent to Ecto, you can subscribe to the [:my_app, :repo, :query] event with telemetry.

Then, each time a query is performed, this event triggers some query metadata that includes the time spent executing the query, retrieving the data from the database, and more.

For more details, see this full list of Ecto telemetry events.

There are many options available to configure the Repo or the adapter as per your needs, but that's out of the scope of this post. Let's just take a very quick look at how you can monitor queries with AppSignal.

Instrumenting Ecto Queries with AppSignal in Your Elixir App

AppSignal automatically instruments Ecto so you can get insights into Queries running in your Phoenix or Plug applications. Make sure the :otp_app configuration option matches your app’s OTP app name, and you’re all set!

Here's an example of how an Ecto query will look in AppSignal:

Ecto query

Read more in our Ecto docs.

Check out our AppSignal for Elixir page.

Query Module

The Query module provides a unified API to write database-agnostic queries in Elixir. Note that building database queries with functions provided by the Ecto.Query module does not result in the queries being executed.

These functions return a query in the form of an Ecto.Query struct. Nothing is actually sent to the database until the built %Ecto.Query{} is passed to one of the functions provided by the Repo module.

As an example, let’s see a simple query that selects all users above 18 years of age:

age = 18 query = from u in "users", where: u.age > ^age, select: u.name

Type this into an IEx console and you will see that it creates a struct like this:

#Ecto.Query<from u0 in "users", where: u0.age > 18, select: u0.name>

You can also print it as a full map to see everything inside it:

iex(9)> IO.inspect(query, structs: false) %{ __struct__: Ecto.Query, aliases: %{}, assocs: [], combinations: [], distinct: nil, from: %{ __struct__: Ecto.Query.FromExpr, as: nil, file: "iex", hints: [], line: 40, params: [], prefix: nil, source: {"users", nil} }, group_bys: [], havings: [], joins: [], limit: nil, lock: nil, offset: nil, order_bys: [], prefix: nil, preloads: [], select: %{ __struct__: Ecto.Query.SelectExpr, aliases: %{}, expr: {{:., [], [{:&, [], [0]}, :name]}, [], []}, fields: nil, file: "iex", line: 40, params: [], subqueries: [], take: %{} }, sources: nil, updates: [], wheres: [ %{ __struct__: Ecto.Query.BooleanExpr, expr: {:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]}, file: "iex", line: 40, op: :and, params: [{18, {0, :age}}], subqueries: [] } ], windows: [], with_ctes: nil }

This is much more interesting — it shows exactly how the simple query is represented internally in Ecto. Ecto.Query.FromExpr contains details about the table we are querying (users).

ASTs in the Query

The other two expressions we see in the query are much more complex, but this is something that the adapters understand and convert to the query. If you look closely, they are ASTs.

Note: If you are interested in learning more about ASTs, check out An Introduction to Metaprogramming in Elixir.

Let's see what the code looks like for this expression:

iex> Macro.to_string({:>, [], [{{:., [], [{:&, [], [0]}, :age]}, [], []}, {:^, [], [0]}]}) "&0.age() > ^0"

This is our condition in where, just normalized into terms the adapters understand.

The adapter does the final translation of the query to actual SQL that the database understands. Note that while we usually write SQL here, the adapters don't need to work with SQL databases only — some adapters work just as well with no-SQL databases. Query generation and all database communication are clearly separated from Ecto's core.

If you want to explore this further, try building out some complex queries with joins, subqueries, windows, etc., and see how they are represented internally — it is a great way to learn how abstractions are made inside Ecto.

Finally, this query struct is converted to a SQL statement by the adapter:

iex> Ecto.Adapters.SQL.to_sql(:all, MyApp.Repo, query) {"SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)", [18]}

Back to Erlang's ETS Table

Back in the section about the Repo module, we created an ETS table when we started the Repository in our application.

Now that ETS table comes into play. When a query is executed multiple times, the query is only prepared the first time.

Note: You can learn more about PREPARE in the context of Postgres.

It is then cached inside that ETS table and fetched from there for all subsequent calls. To see the caching in action, check this out (notice the :cached in result, which signals that this query has been cached):

iex> MyApp.Repo.all(query) # This puts the query in the cache # Trying to prepare the query again gets a cached version iex> Ecto.Adapter.Queryable.prepare_query(:all, MyApp.Repo, query) {{:cached, #Function<41.44551318/1 in Ecto.Query.Planner.query_with_cache/8>, #Function<42.44551318/1 in Ecto.Query.Planner.query_with_cache/8>, {5122, %Postgrex.Query{ ref: #Reference<0.3269063957.2912157697.165169>, name: "ecto_5122", statement: "SELECT u0.\"name\" FROM \"users\" AS u0 WHERE (u0.\"age\" > $1)", param_oids: [20], param_formats: [:binary], param_types: [Postgrex.Extensions.Int8], columns: ["name"], result_oids: [1043], result_formats: [:binary], result_types: [Postgrex.Extensions.Raw], types: {Postgrex.DefaultTypes, #Reference<0.3269063957.2912288771.132368>}, cache: :reference }}}, [18]}

Note that this doesn’t cache the result, only the prepared statements. Prepared statements give a large performance advantage, especially for complex queries. From the Postgres docs:

Prepared statements potentially have the largest performance advantage when a single session is being used to execute a large number of similar statements.

The performance difference will be particularly significant if the statements are complex to plan or rewrite, e.g., if the query involves a join of many tables or requires the application of several rules.

The next important module in Ecto is the Schema. Let's take a look at it now.

Schema Module

You can use Ecto without schemas and it works just as well (as we saw above when we referenced the table names directly).

The Schema module is responsible for defining and mapping a record's attributes (fields and associations) from a database table to an Elixir struct.

To create a schema, we write use Ecto.Schema at the top of our module and use the schema DSL.

For example:

defmodule MyApp.Organization do use Ecto.Schema schema "organizations" do field :name, :string end end defmodule MyApp.User do use Ecto.Schema schema "users" do field :name, :string belongs_to :organization, MyApp.Organization end end
  • The use statement includes several utility functions and macros inside the module and sets some default module attributes required for Ecto to gather data from the Schema.
  • The schema macro then updates some of those attributes to mark that this is a persisted schema (there is also another embedded_schema macro to deal with non-persisted schemas) and sets some other defaults, like the primary key.
  • The field and belongs_to inside the schema block then put those fields in the module attributes (for type validation), and add the fields to the struct defined by the module.

The Ecto.Schema behavior exposes some methods inside the schema to fetch field details. For example:

iex(62)> MyApp.User.__schema__(:source) "users" iex(63)> MyApp.User.__schema__(:fields) [:id, :name, :organization_id] iex(64)> MyApp.User.__schema__(:primary_key) [:id] iex(65)> MyApp.User.__schema__(:associations) [:organization] iex(66)> MyApp.User.__schema__(:association, :organization) %Ecto.Association.BelongsTo{ field: :organization, owner: MyApp.User, related: MyApp.Organization, owner_key: :organization_id, related_key: :id, queryable: MyApp.Organization, on_cast: nil, on_replace: :raise, where: [], defaults: [], cardinality: :one, relationship: :parent, unique: true, ordered: false }

This __schema__ function is also the entry-point for other parts of Ecto to reflect on more details about the defined schema and perform operations on it.

For example, when used as the source of a query, the repo will use schema to validate the conditions in the where clause, and cast the data returned from the database to Elixir structs. This results in much better feedback when there's something wrong.

A Schema Module In Action on an Elixir App

Let's try executing a query that has a typo to see the benefits of using a schema in action:

from u in "users", select: u.id, where: u.ages > 19

Executing this with Repo.all will throw a generic Postgrex.Error:

** (Postgrex.Error) ERROR 42703 (undefined_column) column u0.ages does not exist query: SELECT u0."id" FROM "users" AS u0 WHERE (u0."ages" > 19)

Let's try the same query, but this time with a schema.

from u in MyApp.Accounts.User, select: u.id, where: u.ages > 19

As expected, this also throws an error, but it now includes the line number where it happened and has a more specific Exception type:

** (Ecto.QueryError) lib/my_app/accounts.ex:109: field `ages` in `where` does not exist in schema MyApp.Accounts.User in query: from u0 in MyApp.Accounts.User, where: u0.ages > 19, select: u0.id

This works because the query planner in Ecto can look at the schema's metadata and figure out that this field doesn't exist on the schema even before hitting the database.

Ecto also does type conversions behind the scenes when using the schema. For example, it allows us to run this query:

id = "2" query = from u in MyApp.Accounts.User, where: u.id == ^id Repo.all(query)

On the other hand, if you are not using a schema, a similar query will raise an exception:

query = from u in "users", where: u.id == ^id, select: u.id Repo.all(query) ** (DBConnection.EncodeError) Postgrex expected an integer in -9223372036854775808..9223372036854775807, got "2". Please make sure the value you are passing matches the definition in your table or in your query or convert the value accordingly.

Changeset Module

The final module that we will look at today is Changeset. It provides an interface for validating and transforming data before it is written into a database.

Similarly to Query, Changeset provides a structured way to represent changes to data. It is most commonly used with Ecto schemas, but schemaless changesets are also possible when you don’t need a full-fledged schema.

Ecto.Changeset provides a comprehensive API to work with data.

The cast/4 Changeset

Let's start by looking at the most commonly used changeset, cast/4:

iex> changeset = %MyApp.User{name: "some name"} |> cast(%{"name" => "", "organization_id" => "1", "foo" => "bar"}, [:name, :organization_id]) |> validate_required(:name)

We pass initial data (the MyApp.User struct in this case) to cast followed by some parameters and the list of allowed fields.

cast figures out the type of each allowed field by looking at the schema metadata. cast then typecasts a parameter value to an allowed value, or adds an error to the changeset.

For example, here we can see that we had a value of 1 (String) as the organization_id. But from the schema, cast can figure out that organization_id is of type int and cast the value to an integer before putting the change inside the Changeset.

The validate_required/3 Changeset

The second call in the pipeline, validate_required/3, requires the field to be present (as the name suggests). By default, it trims any strings/binaries before running validations and assumes an empty string to be blank.

Here's what is printed out when you inspect the Changeset.

#Ecto.Changeset< action: nil, changes: %{organization_id: 1}, errors: [name: {"can't be blank", [validation: :required]}], data: #MyApp.User<>, valid?: false >

This small summary already contains most of the information we need about the changeset. It shows what changes were made to the initial data (organization_id was set to 1), and that the changeset is invalid. It lists all the errors.

Let’s go one step further and inspect the full map.

iex> IO.inspect(changeset, structs: false) %{ __struct__: Ecto.Changeset, action: nil, changes: %{organization_id: 1}, constraints: [], data: %{ __meta__: %{ __struct__: Ecto.Schema.Metadata, context: nil, prefix: nil, schema: MyApp.User, source: "users", state: :built }, __struct__: MyApp.User, id: nil, name: "some name", organization: %{ __cardinality__: :one, __field__: :organization, __owner__: MyApp.User, __struct__: Ecto.Association.NotLoaded }, organization_id: nil }, empty_values: [""], errors: [name: {"can't be blank", [validation: :required]}], filters: %{}, params: %{"name" => "", "organization_id" => 1, "foo" => "bar"}, prepare: [], repo: nil, repo_opts: [], required: [:name], types: %{ id: :id, name: :string, organization: {:assoc, %{ __struct__: Ecto.Association.BelongsTo, cardinality: :one, defaults: [], field: :organization, on_cast: nil, on_replace: :raise, ordered: false, owner: MyApp.User, owner_key: :organization_id, queryable: MyApp.Organization, related: MyApp.Organization, related_key: :id, relationship: :parent, unique: true, where: [] }}, organization_id: :id }, valid?: false, validations: [] }

This contains much more information now.

The data and params are self-explanatory — the initial data and the params we fed to cast.

types contains additional data about the schema we are working with (fetched using the __schema__ method we saw in the previous section).

changes and errors are where it gets interesting. cast automatically converts the string organization_id to an integer because it knows that the organization_id is a numeric primary key from the association details.

What's also interesting is that it understands that "" is a blank value and inserts an error into the changeset from the validate_required call.

The database manipulation functions from Repo understand changesets and return errors if the changeset is invalid.

The Changeset API provides several other functions for validating data and database constraints, as well as dealing with associations.

These functions interact with the data and eventually update the struct that we saw above. When fed to the Repo module's functions, structs perform the eventual database operations.

Wrapping Up

In this post, we looked at the core concepts of the Ecto library:

  • We started with a Schema that defined our business objects mapped to database tables.
  • To get those objects out of the database, we used the Query API.
  • We then used Changesets to change those objects, and inserted or updated them in the database.
  • Tying everything together was the Repo module which takes inputs from all the other modules, eventually connecting to the database to fetch data or update records.

All of these modules work together to provide a structured and safe way to interact with databases.

Check out the official Ecto guide to integrate Ecto into your Elixir application. I have also included links to the source code in parts of this post, so feel free to go back and dig a little deeper.

Until next time — happy digging!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Sapan Diwakar

Sapan Diwakar

Our guest author Sapan Diwakar is a full-stack developer. He writes about his interests on his blog and is a big fan of keeping things simple, in life and in code. When he’s not working with technology, he loves to spend time in the garden, hiking around forests, and playing outdoor sports.

All articles by Sapan Diwakar

Become our next author!

Find out more

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps