Build Custom ActiveStorage Analyzers for Ruby on Rails

In this series, we will take a close look at the architecture of ActiveStorage for Rails.

In this first part, we will examine how ActiveStorage treats uploaded data and how to extend this process. The second part will explore how to augment the presentation of uploaded assets.

But first, let's quickly define what ActiveStorage does.

What Is ActiveStorage for Ruby on Rails?

Without recounting the entire ActiveStorage documentation, in a nutshell, ActiveStorage is an adapter to various forms of storing (mostly) user-generated files in your Ruby on Rails application in a straightforward way. The available storage backends can be divided into:

Local disk storage
Diverse flavors of cloud storage (most prominently Amazon S3 and compatible object storage providers)

Note: For the sake of completeness, there are also adapters to directly store binary data in your database, like active_storage-postgresql.

ActiveStorage allows you to transparently attach files to database records for easy access. You have probably already come across this API:

Ruby

class User < ApplicationRecord
  has_one_attached :avatar
end
 
class Post < ApplicationRecord
  has_many_attached :images
end

Here, the class methods has_one_attached and has_many_attached are responsible for wiring up your User or Post records with the respective attachments. But how does this magic work?

Under the hood, ActiveStorage uses two database tables generated when you install it. Here are their entries from the database schema definition:

Ruby

create_table "active_storage_attachments", force: :cascade do |t|
  t.string "name", null: false
  t.string "record_type", null: false
  t.bigint "record_id", null: false
  t.bigint "blob_id", null: false
  t.datetime "created_at", null: false
  t.index ["blob_id"], name: "index_active_storage_attachments_on_blob_id"
  t.index ["record_type", "record_id", "name", "blob_id"], name: "index_active_storage_attachments_uniqueness", unique: true
end
 
create_table "active_storage_blobs", force: :cascade do |t|
  t.string "key", null: false
  t.string "filename", null: false
  t.string "content_type"
  t.text "metadata"
  t.string "service_name", null: false
  t.bigint "byte_size", null: false
  t.string "checksum"
  t.datetime "created_at", null: false
  t.index ["key"], name: "index_active_storage_blobs_on_key", unique: true
end

The first table belongs to the join model ActiveStorage::Attachment. It references a polymorphic record (that's why the methods has_(one|many)_attached can be called from any ActiveRecord model) as well as a Blob. As you might correctly assume, this refers to the second table, which is mapped by the ActiveStorage::Blob model. This acts as a data container for all that is needed to upload or download a file to one of the configured services. Let's take it apart a bit:

The key attribute refers to how the blob is called at its actual storage location (in most cases, an S3 bucket).
filename is the original name of the file under which it was uploaded.
content_type is its analyzed MIME type.
metadata is a generic text column that can hold any metadata of the file, in JSON format. It's this column that our custom analyzers will make use of.
service_name is used to identify the service in config/storage.yml to which this file was uploaded.
byte_size and checksum are exactly what you'd expect these attributes to store.

How Does ActiveStorage Treat Uploaded Data?

With these general concepts out of the way, let's examine the ingest process used by ActiveStorage. In other words, once a file is uploaded, what happens next?

The answer lies in the Attachment model's code. In an after_create_commit callback, it enqueues a job to analyze the blob it pertains to later.

The code that's used to perform the analysis reads simple enough:

Ruby

def analyze
  update! metadata: metadata.merge(extract_metadata_via_analyzer)
end
 
# ...
 
private
  def extract_metadata_via_analyzer
    analyzer.metadata.merge(analyzed: true)
  end
 
  def analyzer
    analyzer_class.new(self)
  end
 
  def analyzer_class
    ActiveStorage.analyzers.detect { |klass| klass.accept?(self) } || ActiveStorage::Analyzer::NullAnalyzer
  end

We observe that, as indicated before, the analyze method updates the metadata column of the blob. This metadata is extracted by an analyzer, but there's an important detail hidden in how exactly this analyzer is provided. The analyzer_class is retrieved from a list of analyzers that the ActiveStorage module itself keeps. Let's take a brief look at it in the Rails console:

Shell

(dev)> ActiveStorage.analyzers
=>
[ActiveStorage::Analyzer::ImageAnalyzer::Vips,
  ActiveStorage::Analyzer::ImageAnalyzer::ImageMagick,
  ActiveStorage::Analyzer::VideoAnalyzer,
  ActiveStorage::Analyzer::AudioAnalyzer]

Listed here are all the standard analyzers that ship with ActiveStorage: two for images (because of the two processing backends, Vips and ImageMagick), one for video, and one for audio data. From this list, the first analyzer that responds to accept?(self) with true is selected. So, for example, ImageAnalyzer checks whether a blob holds an image, and so on. Creating new analyzers, then, only needs a class that extends Analyzer and is prepended to this list. We'll explore how to do this in the remainder of this article.

Two Use Cases for Custom Analyzers

When would you reach for a custom analyzer? I contend that most use cases revolve around needs for enhanced presentation of assets. To back up that claim, I have prepared two prototypical examples:

precomputing audio waveform data
calculating image blurhashes

Extracting and Storing Audio Sample Data

Let's assume that we are building a directory of songs. We might use a Song model, which has an attached recording for this:

Ruby

class Song < ApplicationRecord
  has_one_attached :recording
end

If we create a new record of this model and attach an audio file, what happens? In a full-stack scenario, we would use an upload form, but for the sake of investigating the analysis process, let's just create one from the Rails console:

Shell

(dev)> song = Song.create(title: "Ruby Blues in D Flat")
(dev)> song.recording.attach(io: File.open("/path/to/file"), filename: "ruby_blues.wav")

Apart from the usual SQL log, Rails also informs us that it has enqueued a job to process the data:

Shell

Enqueued ActiveStorage::AnalyzeJob (Job ID: 2a923033-b0d2-4bfc-b368-d3eb1344e64b) to Async(default) with arguments: #<GlobalID:0x000000012292f428 @uri=#<URI::GID gid://active-storage-analyzers-previews/ActiveStorage::Blob/1>>

Let's now inspect the respective Attachment and Blob records:

Shell

(dev)> song.recording_attachment
=>
#<ActiveStorage::Attachment:0x00000001240c8eb8
  id: 1,
  name: "recording",
  record_type: "Song",
  record_id: 1,
  blob_id: 1,
  created_at: "2025-06-15 15:35:59.470664000 +0000">
 
(dev)> song.recording_blob
=>
#<ActiveStorage::Blob:0x00000001235c6f60
  id: 1,
  key: "oi7xszss3y6kfer601zvxfpl1muz",
  filename: "ruby_blues.wav",
  content_type: "audio/x-wav",
  metadata: {"identified" => true},
  service_name: "local",
  byte_size: 35765766,
  checksum: "ddUPZtkz3hqVI9wUGOwp4g==",
  created_at: "2025-06-15 15:35:59.461367000 +0000">>

Aha! The file has been correctly identified as being of type audio/x-wav, but apart from that, there's no other metadata being persisted. We're here to change that.

First, let's write our custom analyzer. We'll put it in the lib/active_storage directory, and call it ActiveStorage::WaveformAnalyzer. To draw upon the existing implementation, we'll inherit from ActiveStorage::Analyzer::AudioAnalyzer.

We saw above that the metadata method is responsible for returning the appropriate data, so we'll override it. We have to be careful to call super and merge any new data into what the parent class already provides.

Ruby

# lib/active_storage/waveform_analyzer.rb
 
module ActiveStorage
  class WaveformAnalyzer < ActiveStorage::Analyzer::AudioAnalyzer
    def metadata
      super.merge waveform
    end
 
    def waveform
      rms_values = []
 
      download_blob_to_tempfile do |file|
        IO.popen([ ffmpeg_path,
          "-i", file.path,
          "-ac", "1",
          "-f", "f32le", "-"
        ], "rb") do |io|
          frame_size = 4 # mono, 4 bytes (float32)
          chunk_size = 512 * frame_size # 512 frames
 
          while chunk = io.read(chunk_size)
            floats = chunk.unpack("e*") # little-endian float32
 
            next if floats.empty?
 
            rms = Math.sqrt(floats.sum { _1 ** 2 } / floats.size)
            rms_values << rms
          end
        end
      end
 
      # optionally store as Base64 packed string to save space
      # { waveform: [ rms_values.pack("e*") ].pack("m0") }
 
      { waveform: rms_values }
    end
 
    def ffmpeg_path
      ActiveStorage.paths[:ffmpeg] || "ffmpeg"
    end
  end
end

The real meat, though, is of course the computation of waveform datapoints. Because ActiveStorage depends on it, we can utilize ffmpeg for our purposes. The full call we pass to IO.popen here reads like this:

Shell

ffmpeg -i FILE_TO_ANALYZE -ac 1 -f f32le -

Here, -ac 1 tells ffmpeg to mix the audio down to one channel, while -f f32le specifies "float 32-bit little endian" as the output format. The final dash - instructs ffmpeg to output this to STDOUT, so we can actually pick it up in the block passed to IO.popen.

There's one important signal processing detail to mention: storing each and every sample as metadata wouldn't be very efficient — in fact, it would result in storing the entire audio file as a JSON array. Instead, we perform some data compression here by calculating the root mean square over a frame. That's only a fancy way of saying we're calculating the average of 512 (an arbitrary number, but mostly a power of 2) samples, but we square it and subsequently take the square root, because audio sample data can be both positive and negative.

Our new analyzer isn't yet wired up to be used by the Rails application, so we do that in an initializer:

Ruby

# config/initializers/active_storage.rb
 
require_relative "../../lib/active_storage/waveform_analyzer.rb"
 
Rails.application.config.active_storage.analyzers.prepend ActiveStorage::WaveformAnalyzer

It's important to prepend it to the list here, because, as we've observed above, the first analyzer in the list that accepts audio files will be used.

Let's put it to use in the Rails console again:

Shell

(dev)> song = Song.create(title: "Ruby Blues in D Flat")
(dev)> song.recording.attach(io: File.open("/path/to/file"), filename: "ruby_blues.wav")
(dev)> song.reload.recording_attachment.metadata[:waveform]
=>
[0.00012394643736996606,
 0.00087319937227967,
 0.0037783625670793465,
 0.005877352246693453,
 ... etc.]

Now that we have a compressed representation of the audio in our metadata, how can we make use of it? We'll take a look at an idiomatic ActiveStorage method in the second part of this series, but for starters, many JavaScript audio widgets allow you to specify precomputed waveform data, like WaveSurfer does in this example.

Calculating Image Blurhashes in Ruby

Blurhashes are compressed representations of images you can use instead of generic placeholders for an enhanced lazy loading experience. The blurhash Ruby gem provides a straightforward way to encode such strings from images. We add it to our application's dependencies like so:

Shell

$ bundle add blurhash

Before we begin writing our custom analyzer, it's important to note that the image processing backend comes in two flavors: Vips or ImageMagick. Since its API is a bit simpler, we'll concentrate on the latter and configure it in config/application.rb:

Ruby

config.active_storage.variant_processor = :mini_magick

We can now begin our implementation by subclassing the ActiveStorage::Analyzer::ImageAnalyzer::ImageMagick base analyzer:

Ruby

# lib/active_storage/blurhash_analyzer.rb
 
module ActiveStorage
  class BlurhashAnalyzer < ActiveStorage::Analyzer::ImageAnalyzer::ImageMagick
    attr_accessor :thumbnail
 
    def metadata
      read_image do |image|
        build_thumbnail(image)
        super.merge blurhash
      end
    end
 
    def blurhash
      {
        blurhash: ::Blurhash.encode(
          thumbnail.width,
          thumbnail.height,
          pixels
        )
      }
    end
 
    def build_thumbnail(image)
      # we scale down the image for faster blurhash processing
      @thumbnail ||= MiniMagick::Image.open(
        ::ImageProcessing::MiniMagick.source(image.path).resize_to_limit(200, 200).loader(page: 0).call.path
      )
    end
 
    def pixels = @thumbnail.get_pixels.flatten
 
    protected
 
    def processor = "ImageMagick"
  end
end

What's going on here? We encounter an already familiar pattern: we populate the database column of the same name with more data using the metadata method. read_image is just a helper method provided by Rails that opens the image as a file, ready for us to use. The build_thumbnail method is actually optional, but very helpful for speeding up the blurhash calculation. We simply use MiniMagick to scale down the image to fit within the bounds of 200x200 pixels. The blurhash method then employs this method to build the compressed string representation from the downscaled image.

Like above, we prepend it to the list of analyzers in our initializer:

Ruby

# config/initializers/active_storage.rb
 
require_relative "../../lib/active_storage/waveform_analyzer.rb"
require_relative "../../lib/active_storage/blurhash_analyzer.rb"
 
Rails.application.config.active_storage.analyzers.prepend ActiveStorage::WaveformAnalyzer
Rails.application.config.active_storage.analyzers.prepend ActiveStorage::BlurhashAnalyzer

Now it's time to test it out. For this, we'll use a test image from picsum.photos:

Let's open a Rails console and attach this image to a post:

Shell

(dev)> post = Post.create(title: "Active Storage Analyzers")
(dev)> post.images.attach(io: URI.open("https://picsum.photos/id/128/1200/800"), filename: "picsum_128.jpg")
 
# we inspect its metadata we will now find a blurhash representation of the image:
 
(dev)> post.reload.images.first.metadata["blurhash"]
=> "LWDJS1o#D%kD~qbIIUof%2WARkfP"

For reference, converted to an actual preview image, it looks like this:

To make use of this in our application frontend, the blurhash needs to be decoded and presented. Typically, this involves using the official TypeScript library and a bespoke Stimulus controller. In the second part of this series, we'll take a look at implementing a pure ActiveStorage-generated preview.

That's it for this first part!

Wrap Up

In this article, we've taken a few first steps to customize how ActiveStorage handles and processes media data. We've learned that the ActiveStorage::Blob model is where the data describing an attachment is stored. When writing bespoke analyzers, it's necessary to put any results in its metadata column.

We then looked at two examples explaining when and how to write your own custom ActiveStorage analyzers: extracting interleaved waveform data from audio files and calculating image blurhashes. Both implementations demonstrate the diligence that has been put into ActiveStorage's background media processing API: the glue code that is necessary to plug binaries like ffmpeg or ImageMagick into the analysis pipeline is minimal.

In the second and final part of this series, we will reverse this process and examine ways to implement custom ActiveStorage previewers, providing compact graphic representations of the calculated metadata.

Happy coding!