Pagination means dividing a big chunk of data into smaller pages. It unlocks performance benefits for the backend, while also improving UX by sending manageable pieces of data to a client.
In this article, we'll explore offset-based and cursor-based server-side pagination approaches, compare them, and implement cursor-based pagination in Node.js.
Let's get started!
Types of Pagination
There are two broad pagination approaches:
- Client-side pagination
- Server-side pagination
The significant difference between them lies in where the pagination process takes place. In client-side pagination, the complete dataset is fetched from a database and partitioned into pages of a given size on the client side. Meanwhile, in server-side pagination, the client only fetches the data required for a given page number. When a user navigates to the next page, the client makes a fresh call to the server for new data.
So, for smaller datasets, client-side pagination works well, but server-side pagination best fits large data use cases.
In the coming sections, we'll learn about two primary ways to perform server-side pagination.
Before we proceed, we need a sample API to work with.
Prerequisites
To follow this tutorial, you'll need:
- A PostgreSQL server installed and running on your local machine (I prefer psql to interact with the database).
- Node.js and npm installed, with Node.js version >= 16.
- A basic understanding of a REST API with the Express.js framework.
This article contains some code snippets, but you can find the complete working code on GitHub.
Installing Our Sample Node.js API
Here are the steps to run the sample API locally.
- Clone the repo:
- Install the dependencies:
- Source the environment variables.
You can find the env variables in config.js
at the project's root level. Rename the .env.example
file to .env
and fill in your desired values for the environment variables.
If Postgres is not set to trust local connections, provide a correct password for your Postgres user. To turn off password authentication, edit your pg_hba.conf
file (run SHOW hba_file;
as a superuser to find the file location) to trust all local connections. It should look like this:
Once done, you can load environment variables with source .env
on macOS. Otherwise, try replacing "source" with "." For more information, refer to the 'source command not found in sh shell' StackOverflow thread or load the environment variables directly into the terminal.
- Run the server:
- Create a table in your Postgres database. Connect to your Postgres database locally and run the below command to create a table named "items".
- Load sample data in your items table by hitting the API endpoint — POST /item/create.
You should now have an API server running locally and connected to a local Postgres database. Feel free to run the queries in this article locally to solidify your understanding.
In the following sections, let's look at offset-based and cursor-based pagination, starting with offset-based first.
Understanding Offset-Based Pagination
Offset-based pagination is intuitive and straightforward, which explains why it's been widely adopted. The key parameters behind this pagination type are offset
and limit
clauses.
The offset
clause helps skip past a set of rows, while limit
limits the query output.
The latency of executing queries on a database with offset-based pagination increases as the data grows. Here's the planning and execution time when we run it on our sample database (you can run the same commands in your local Postgres database):
The latency grows by more than 10x when the offset parameter is increased:
The good part about offset-based pagination is that it can adapt quickly to almost any pagination style. It's straightforward to implement, whether you're implementing sequential or random access.
For implementing sequential access (previous/next, infinite scroll, or load more pagination style), the offset is simply increased or decreased to get the desired data. For instance, the offset is four times the page size to load the fifth page. And for returning to the first page, the offset is zero times the page size.
Drawbacks of Offset-Based Pagination
It's important to note that our table from the last section stores very minimal data — name
, description
, created_at
, and id
. The data is much bigger in real-world scenarios, and latencies can exceed the linear proportionality scale.
If we keep performance-related concerns aside, offset-based pagination has another major drawback — duplicate or missing entries.
Let's understand with an example.
We have ten rows in a table, the page size is five, and the rows are ordered by most recent first. Here's how the two pages ideally look:
Imagine a new row is inserted into the table while a user is on the first page. When the user visits the next page, a duplicate (or already seen) entry creeps into the following page:
Similarly, if one of the entries on the first page gets deleted when the user visits the second page, they permanently miss that entry:
The user never gets to see the fifth entry because it's on the first page now, and the user is on the second page.
Cursor-based pagination is immune to such issues because it doesn't use the offset
clause. We'll look into cursor-based pagination in detail in the next section. But from what we have observed so far, offset-based pagination can be summarized as follows:
Pros
- Intuitive and straightforward to implement.
- Works well for datasets that don't change frequently.
- Implementing any pagination logic (sequential and random access) is straightforward.
Cons
- Noticeable performance drops as the dataset grows.
- Unreliable for frequently changing data, as it can show duplicate entries or miss them.
Let's look at how cursor-based pagination offers better potential in terms of performance and how it handles the problem of missing/duplicate entries.
Understanding Cursor-Based Pagination
Cursor-based pagination leverages the where
clause to get relevant data. Compared to offset-based pagination, this is a better approach in terms of both performance and reliability.
The stark difference in the query planner's number of rows is immediately visible, and so are the planning and execution times:
Offset-based pagination is almost 9x slower compared to cursor-based pagination.
It's important to mention that the cursor field plays a crucial role in the correctness of the result. Any column can work as a cursor if the values follow a strictly increasing or decreasing order. Sometimes, that means going out of the way to add such columns to the table (a timestamp or an incrementing/decrementing ID, for instance).
To emphasize the importance of a strictly increasing/decreasing order, let's analyze a potential issue I faced while working on the API:
- In our items table,
created_at
is a timestamp column (with timezone) that's precise to the millisecond. - But
new Date().toISOString()
in Javascript doesn't provide the same precision. As a result, our API stores duplicate values from two rows if they are created close enough.
This leads to unexpected behavior when fetching a new page from the database. I fixed it by putting some time intervals (big enough for JavaScript to notice) between every created item.
Looking at all the aspects of cursor-based pagination, the pros and cons are as follows:
Pros
- Significant performance upside on bigger datasets.
- Works for frequently changing datasets — no duplicates or missing entries problems.
Cons
- Relatively complex to implement — especially with random access pagination style (where the user can jump to the fifth page from the first one).
- Cursor selection plays a crucial role in the correctness of output.
There are merits to both offset- and cursor-based approaches — let's explore which one to pick for the needs of your application.
Choosing the Right Pagination Strategy for Your Node.js App
It largely depends on the scope and impact. If you're working on a smaller dataset, it's not worth the hassle of setting up cursor-based pagination. Going ahead with an intuitive and relatively simple approach will allow you to focus on bigger goals for your app (e.g., bringing in more customers).
However, latency becomes noticeable as your database hits millions of rows. The random access pattern can add to latency and throughput. Depending on the use case, embracing complexity might be the only way forward. We'll talk more about UX-imposed challenges later in this article.
Let's see how we can set up cursor-based pagination next.
How to Implement Cursor-Based Pagination
The idea is to track a column's value with a cursor and use that to navigate across different pages. We'll implement sequential access and use the most recent first sorting order. The pagination style can be any one of the following:
- "Next" and "previous" buttons to navigate
- An infinite scroll
- A load more button
Here's an overview of the next/previous pagination style:
The only difference in an infinite scroll or a load-more variant is that the user cannot return to the previous page because there's no previous page. The data just keeps getting appended.
Regardless of the pagination style you're going with, the initial load will fetch a fixed number of entries from the database:
We get back ten entries sorted in the desired order:
When the user visits the second page, the where
clause helps us fetch the next set of ten entries:
We've picked the smallest created_at
value for the where
clause. It gives us the following ten entries (second page):
So far, so good.
The query is similar if a user goes back to the first page:
We've used the largest created_at
value and filtered out the entries greater than that with a max cap of ten again. But there's a catch. The sorting order has changed in the query. Let's look at the diagram below and understand why it is required.
First of all, let's highlight the known pieces of this setup:
- The page size is ten.
- The entries are sorted by most recent first.
- There are 30 entries in the table, with each row showing the timestamp (number on the tiles).
- We are currently on the last page. The current largest and smallest timestamps are highlighted on the diagram.
The first step to move back to the previous page is simple — we want to find all the rows where the timestamp exceeds the current largest timestamp (highlighted in the diagram). In our case, twenty rows match this filter.
The second step is to pick the ten most recent rows from the results obtained in the first step. However, sorting the rows in descending timestamp order will give us rows 30 to 21. We want rows 20 to 11. This is why we need to sort the rows in ascending order.
This sudden change in the sorting order can be problematic if the client expects a sorted response from the API. This implementation detail can be ironed out during the design phase.
If a client shares a sorting preference in the request, the server can sort the results before sending them to the client:
If the sorting order changes when a user has progressed to future pages, then usually the first page resets with the new sorting order. For example, if the user requests changing the sorting order from most recent first to the oldest first while on page 2, it should reset to the first page (with entries sorted by oldest first).
If this feels like an information dump, reviewing the complete code should give you a good birds-eye view of the implementation:
Now that we have implemented cursor-based pagination for our Node.js API, it's time to look at how this approach plays out with random access patterns.
UX-Imposed Challenges
There are a couple of ways to implement pagination in the UI:
- Sequential access - Infinite scroll or "Load more" or "prev/next" buttons.
- Random access - Numbered page.
Offset-based pagination works (ignoring performance bottlenecks) with both access patterns. However, when using cursor-based pagination, random access can be problematic. Let's quickly recap how cursor-based pagination works:
- Fetching the first page involves a fetch with a
limit
clause (without any cursor). - For the second page, the combination of
where
andlimit
is used, e.g.,where created_at < some_timestamp and limit = 10
.
If we look closer, fetching the second page (and so forth) requires the smallest/largest timestamp of the first page. To allow random access, i.e., jump to the fifth page from the first one, we need to know the smallest/largest timestamp of the fourth page.
This leads to the following conclusions:
- Enabling random access is only possible when you know each page's smallest/largest timestamp values.
- Precomputing and storing the smallest/largest timestamps that correspond to every page only helps a little if the data keeps changing (with rows being added or deleted).
Even though computing this on every request is better than offset-based pagination, it is undoubtedly a more complex approach, so best avoided for smaller-scale applications.
Wrapping Up
In this post, we've learned why pagination is necessary, the types of pagination, and two primary ways to implement server-side pagination. We also discussed some UX-imposed challenges when working with cursor-based pagination.
Now, you should have sufficient knowledge to implement end-to-end pagination and improve the performance of your Node.js API.
Thanks for reading!
P.S. If you liked this post, subscribe to our JavaScript Sorcery list for a monthly deep dive into more magical JavaScript tips and tricks.
P.P.S. If you need an APM for your Node.js app, go and check out the AppSignal APM for Node.js.