
ActiveRecord: Rails' de facto ORM. We use it every day, and it makes life a lot easier; we type less, read less, have complexity hidden from us, and don't have to specify all kinds of boilerplate we'd otherwise have to.
But its most basic methods, easy-to-read symbol notation, and complete query abstraction aren't always up to the task at hand. As our apps grow, more complex data models and relationships follow, and the limitations of basic querying methods become apparent. Simple, short, and readable queries that once worked well become insufficient and need not just modification but rewriting. This progression calls for a deeper understanding and utilization of ActiveRecord's capabilities beyond the basic finders and associations. Often you need a combination of SQL and ActiveRecord to get the job done.
Here, we'll delve into advanced ActiveRecord queries, shining a spotlight on more complex joins, custom SQL, and strategic employment of database-specific features. This will help us better approach our data handling in Rails and meet the needs of our increasingly complex database relationships.
Complex Joins and Associations in ActiveRecord for Rails
What may begin as a simple one-to-one or one-to-many relationship can quickly expand into a labyrinth of interconnected models, necessitating a deeper dive into complex SQL joins to maintain efficient data retrieval and manipulation.
ActiveRecord supports more complex joins out of the box; you can easily join multiple tables together using pure AR and symbol notation:
# Join our single item from an order to its order, the transaction, and the seller profile, as well as the purchaser (user), # finding all purchased items by a particular user from a particular seller PurchasedItem.joins(order: [:user, transaction: :seller_profile]) .where( orders: {users: {id: user_id}, transactions: {seller_profiles: {id: seller_profile_id}}} )
This is great and often suffices. But what happens the moment we realize we need some information from the transaction and the user too? We can't just .select(:id)
and hope it'll somehow magically know we want the id
of the Transaction
and not the PurchasedItem
.
This is where we need ActiveRecord to leverage SQL strings.
PurchaseItems.joins(...) .where(...) .select('desired, purchased_item, columns') .select('transactions.id AS transaction_id, transaction.total AS total_order_amount, users.name AS purchaser_name, users.email AS purchaser_email')
Here we've selected whatever columns we wanted from each PurchasedItem
(which is of course more efficient than simply selecting all of its columns, AR's default behavior). We've fetched the transaction IDs and total amounts associated with each individual purchased item of a given order, as well as the user's name and email.
We also don't always join exactly the way our models might expect, depending on our use case:
# Here we join on the users table in order to retrieve not the purchaser, but our sales member who was responsible for the sale, # as well as joining the order on some special identifier our company uses internally Transaction.joins('JOIN users AS sales_member ON sales_member.id = transactions.sales_member_id') .joins('JOIN orders ON orders.proprietary_internal_company_identifier = transactions.proprietary_internal_company_order_identifier')
However, when discovering our team's sales_members
via the users
table, sometimes a User
is more than just a User
. This leads us to our next topic: self-referential associations.
Self-Referential Associations
Self-referential associations define both sides of a relationship within the same model. This involves creating a join table that links two instances of a model to each other.
For example, in an organizational context, where users might have managers
and sales_members
, the User
model can be configured to reflect these relationships:
class User < ApplicationRecord # Associates sales_members to a user where the user is the manager has_many :sales_members, class_name: "User", foreign_key: "manager_id" # Links a user to their manager, also a User belongs_to :manager, class_name: "User", optional: true end
This allows us to traverse the hierarchy in both directions, allowing for queries that can retrieve a user's manager or their list of sales_members
(e.g., user.manager
or manager.sales_members
). It also greatly simplifies self joins, so:
# return all users who have managers User.joins(:manager)
Becomes:
SELECT users.* FROM users INNER JOIN users ON users.id = users.manager_id
In this relatively simple example, we've simply placed a manager_id
column on the User
, but there's nothing stopping you from using join tables to create more complex many-to-many associations.
That said, keep in mind that, while powerful and often necessary, self-referential joins can lead to more complex queries. This isn't always avoidable, but if you don't actually need them, creating another model might be a better choice. As always, balance needs with the potential cost.
Database-Specific Features in ActiveRecord for Ruby on Rails
If you're on Rails, there's a pretty good chance you're using PostgreSQL. ActiveRecord, together with PostgreSQL, supports column types like JSON, JSONb, and arrays.
Do you have a particular model with many potentially optional columns? Or one that's likely to grow, where you'll need a lot of model-specific columns? This might be anything from user preferences, to video stats, to a literal need to just store pure JSON (like webhook responses).
If so, JSON or JSONb might be a good choice.
JSON vs. JSONb: Which is the right fit?
If you need to query them often, you should probably go with JSONb as it has more advanced querying options, and supports indexing — specifically, GIN indexing. JSONb uses more space and is slightly slower to write, but is much more efficient to retrieve: the "b" in JSONb stands for "binary". JSON is stored as a string, whereas JSONb is stored as binary, meaning it can be queried more efficiently, and doesn't require parsing when retrieved.
Here's an example of setting up and querying JSONb:
# note you will likely want to set a default to an empty hash, so you don't have to check if the value is NULL in order to attempt to access any keys add_column :videos, :video_stats, :jsonb, default: {}
# using some ActiveRecord syntax: Video.where('video_stats @> ?', {filename: "uploaded_video_1080p.mp4"}.to_json) # alternatively, you might find a string more readable, or simply need it for more involved queries: Video.where(" video_stats ->> 'filename' LIKE '%mp4' AND (video_stats ->> 'duration')::float < 6.0 OR (video_stats ->> 'processed')::boolean = false" )
You can see the appeal here: if you're hosting or cataloging video, there are all kinds of stats and metadata (relevant probably only to the videos themselves) that you might want to keep track of, for everything from bitrate, to resolution, to compression rate, video and codec type, upload and processed dates, fidelity, filetype, and so on. JSON can be an attractive choice for a model that demands a lot of columns and that may continue to grow as your app does.
Bulk Updates with update_all
While we might often retrieve many records at once, we often find ourselves updating records one by one. Sometimes, though, a group of records needs to be updated with the same value. Maybe we've just run a job that's processed a bunch of records in some way, perhaps by making API calls to an external payment service for each.
We could update them one at a time as we process them, but this puts unnecessary strain on our workers and database. Instead, we can simply use update_all
, which updates all records it's called on with a given value per column:
# be sure to include `updated_at` if you've got timestamps and their value is important to you, as update_all doesn't do so automatically @processed_records.update_all(processed: true)
This achieves an atomic update of all records involved in a single transaction. Nice!
We also have insert_all
, introduced a few years ago by Rails 6. You will likely not use this as often, but it's still valuable, especially if you need to create a lot of records at once in real time on the main thread.
Maybe you allow your clients to upload their large orders to you. After processing their CSV, you construct their order:
# insert_all accepts an array of hashes. # if you've got timestamps on a model, you need to include them both, or it will fail new_orders = [ {name: "Item 1", completed: false, created_at: Time.now, updated_at: Time.now}, {name: "Item 2", completed: false, created_at: Time.now, updated_at: Time.now}, ..., {name: "Item 15922", completed: false, created_at: Time.now, updated_at: Time.now}, ] Order.insert_all(new_orders)
Always consider whether or not you need callbacks or validations on the items you're updating.
Because these transactions are atomic, there's no way to trigger callbacks. Sometimes that's exactly what we want, which is perfect. If not, though, we might not be able to use update_all
. If you need a callback (or, more rarely for a bulk update, a validation) to do something specific for each individual record, we probably can't.
Let's say you've only got a callback to ensure some other part of our app knows that at least one record of a particular type has been updated (in order to trigger a recount). You can probably do something like update_all
on the first n-1 records, then .update
on the last. This not only allows us to update everything at once, but it also avoids triggering potentially expensive callbacks on every single item.
Bonus: Got a mass-update scenario where you need to insert a lot of records at once? Check out upsert_all
.
ActiveRecord for Ruby on Rails: Words of Caution
It goes without saying, but be mindful of potential SQL injection. Rails does a good job of protecting us from it most of the time, but it does give us enough rope to hang ourselves with. I personally like to often use single quotes for SQL strings to remind myself not to use string interpolation for things like params that users have control over.
Be mindful that JSON/JSONb columns are a single column and not self-documenting like the rest of your schema. The keys you add to any given JSON column are arbitrary, and you will have to document them yourself if you need to keep track of them all without having to hunt around your code for clues, because they're not going to show up in the schema.
Alternatively, you can use ActiveRecord's store_accessor
, like this:
class Video < ApplicationRecord store :video_stats, accessors: [:filename, :duration, :processed] #... end
This will allow you to both explicitly declare/document all the potential keys that exist in your JSON/JSONb column, and also access them directly:
# like this: video.duration # rather than this: video.video_stats["duration"]
Just be sure not to accidentally give any of the keys the same name as an existing column on the same model if you're going with this route! You can have JSON keys with the same names as columns, but if you're using this store_accessor
method, you'll run into name collisions.
Lastly, as mentioned, always make sure that if you're using update_all
, insert_all
, or upsert_all
, you either don't need any callbacks or validations, or that those requirements are met after the fact. In some cases, this could be as simple as firing off a job after you've updated or created these records. In other cases, there might not be an easy path.
Wrapping Up
Leveraging ActiveRecord's advanced features can help make our Rails apps more flexible, efficient, and maintainable. But as with anything, good things are rarely free; keep in mind that all things being equal, simpler is better. If you can accomplish the same task more simply, you probably should. It's always important to balance complexity with simplicity.
There's always more to learn when it comes to Rails, and ActiveRecord is no exception. The more tools you add to your arsenal, the more effective you'll be, and the better your applications will run.
Thanks for reading!
P.S. If you'd like to read Ruby Magic posts as soon as they get off the press, subscribe to our Ruby Magic newsletter and never miss a single post!