Safely partitioning Postgres tables in Rails

As your Postgres tables grow into the tens or hundreds of millions of rows, you'll start hitting issues whether it's performance in terms of query times or just raw storage costs.

For both Bento and Tatami we were storing a lot of data that we just simply didn't need to retain for more than a few days.

Running queries like Event.where('occurred_at >= ?', 1.day.ago).delete_all would take a long time, require a lot of memory, and because Postgres simply marks those rows as dead tuples we'd never reclaim space until we did a VACUUM FULL.

VACUUM FULL is a heavy operation, locks the table, and stops all writes to the table. I always want to avoid downtime especially if I'm the one causing it.

So what's the solution?

Table partitioning!

I spent some time reading up on the different partitioning strategies and tools available and settled on pgslice.

I wanted a Ruby specific way of handling this and wanted to avoid using extensions (pg_partman, pg_pathman, etc) which I personally felt to be a bit flaky and scary.

Prerequisites

pgslice gem installed in your production environment
PGSLICE_URL or DATABASE_URL set with proper credentials
DB role can create tables, functions, and triggers
A maintenance window for the swap is recommended but not required
pgslice flow at a glance: prep → add_partitions → fill or dual sync → swap

When to partition

Tables exceed ~10–50M rows and are still growing
Your queries are time-range heavy (e.g., last N days/months)
Count/analytics queries are slow and memory heavy
You have a time-based retention policy and want cheap drops

The PgsliceRunner Service

The PgsliceRunner service wraps the pgslice CLI and adds some helpful Rails-specific methods for mirroring writes and running sanity checks. It's designed to be run from the Rails console, step by step, so you can verify each phase before moving to the next. pgslice infers the partition period from prep; specifying period elsewhere is for call-site clarity. Here's the code:

require 'open3'
require 'cgi'

class PgsliceRunner
  def self.url
    ENV['PGSLICE_URL'].presence || ENV['DATABASE_URL'].presence || begin
      cfg = ActiveRecord::Base.connection_db_config.configuration_hash
      user = CGI.escape(cfg[:username].to_s)
      pass = CGI.escape(cfg[:password].to_s)
      host = cfg[:host] || 'localhost'
      port = (cfg[:port] || 5432).to_s
      db   = cfg[:database]
      if user.blank?
        "postgres://#{host}:#{port}/#{db}"
      else
        "postgres://#{user}:#{pass}@#{host}:#{port}/#{db}"
      end
    end
  end

  def self.run(*args)
    cmd = ['bundle', 'exec', 'pgslice', *args, '--url', url]
    stdout, stderr, status = Open3.capture3(*cmd)
    Rails.logger.info("[PgsliceRunner] #{cmd.join(' ')}\n#{stdout}")
    raise(StandardError, "pgslice failed: #{stderr.presence || stdout}") unless status.success?
    stdout
  end

  # Prepare the partitioned table on an intermediate, creating past/future partitions
  def self.prepare_partitioned_table(table: 'events', column: 'occurred_at', period: 'day', past: 7, future: 7)
    run('prep', table, column, period)
    args = ['add_partitions', table, '--intermediate']
    args += ['--past', past.to_s] if past.to_i > 0
    args += ['--future', future.to_s] if future.to_i > 0
    run(*args)
    true
  end

  # Add upcoming partitions on the active (post-swap) table or intermediate if requested
  # period: 'day' or 'month'
  # Note: pgslice infers period from table prep; exposing period for clarity in call sites
  def self.add_future_partitions(table: 'events', period: 'day', future: 7, intermediate: false)
    args = ['add_partitions', table, '--future', future.to_s]
    args << '--intermediate' if intermediate
    run(*args)
    true
  end

  # Quick sanity checks that normal queries work against the current active table
  def self.check_queries(model: 'Event', table: 'events', column: 'occurred_at')
    m = model.constantize
    latest_id = ActiveRecord::Base.connection.select_value("SELECT id FROM #{ActiveRecord::Base.connection.quote_table_name(table)} ORDER BY id DESC LIMIT 1").to_i
    find_ok = false
    where_ok = false
    begin
      m.find(latest_id) if latest_id.positive?
      find_ok = latest_id.positive?
    rescue StandardError
      find_ok = false
    end
    begin
      where_ok = m.where("#{column} >= ?", 1.day.ago).limit(1).exists?
    rescue StandardError
      where_ok = false
    end
    { latest_id: latest_id, find_ok: find_ok, where_ok: where_ok }
  end

  # Observe new writes over a short window to confirm mirroring to <table>_intermediate
  def self.check_dual_write(table: 'events', sample_seconds: 30)
    orig = table
    inter = "#{table}_intermediate"
    q = ->(t) { ActiveRecord::Base.connection.select_value("SELECT COALESCE(MAX(id), 0) FROM #{ActiveRecord::Base.connection.quote_table_name(t)}").to_i }
    before_orig = q.call(orig)
    before_inter = q.call(inter)
    sleep sample_seconds.to_i
    after_orig = q.call(orig)
    after_inter = q.call(inter)
    inserted_orig = [after_orig - before_orig, 0].max
    inserted_inter = ActiveRecord::Base.connection.select_value(
      "SELECT COUNT(*) FROM #{ActiveRecord::Base.connection.quote_table_name(inter)} WHERE id > #{before_inter}"
    ).to_i
    {
      before_orig: before_orig,
      after_orig: after_orig,
      before_inter: before_inter,
      after_inter: after_inter,
      inserted_orig: inserted_orig,
      inserted_inter: inserted_inter,
      mirroring_observed: inserted_inter >= [inserted_orig, 1].max - 1
    }
  end

  # Perform swap; optional backfill_where for scoped migrations; disables triggers and syncs sequence
  # Note: ensure_sequence is handled by pgslice's swap command
  def self.swap_partitioned_table(table: 'events', backfill_where: nil, disable_triggers: true, ensure_sequence: true, disable_fill: true)
    run('analyze', table)
    run('swap', table)
    if backfill_where.present?
      run('fill', table, '--swapped', '--where', backfill_where)
    else
      run('fill', table, '--swapped') unless disable_fill
    end
    if disable_triggers
      disable_mirroring_triggers(table: "#{table}_retired") rescue nil
      disable_mirroring_triggers(table: table) rescue nil
    end
    true
  end

  # Enable dual-write from original table to intermediate via triggers
  def self.enable_mirroring_triggers(table: 'events')
    intermediate = "#{table}_intermediate"
    function = "#{table}_mirror_to_intermediate"
    trigger  = "#{table}_mirror_trigger"

    sql = <<~SQL
      DO $do$
      BEGIN
        -- create function if not exists
        IF NOT EXISTS (
          SELECT 1 FROM pg_proc p
          JOIN pg_namespace n ON n.oid = p.pronamespace
          WHERE p.proname = '#{function}' AND n.nspname = 'public'
        ) THEN
          EXECUTE $ddl$
            CREATE OR REPLACE FUNCTION public.#{function}()
            RETURNS trigger
            LANGUAGE plpgsql
            AS $fn$
            BEGIN
              IF TG_OP = 'INSERT' THEN
                INSERT INTO public.#{intermediate} SELECT (NEW).*;
              ELSIF TG_OP = 'UPDATE' THEN
                UPDATE public.#{intermediate} i SET
                  (#{mirror_column_list(table)}) = (#{mirror_new_tuple_list(table)})
                WHERE i.id = NEW.id;
              ELSIF TG_OP = 'DELETE' THEN
                DELETE FROM public.#{intermediate} WHERE id = OLD.id;
              END IF;
              RETURN NULL;
            END;
            $fn$;
          $ddl$;
        END IF;

        -- create trigger if not exists
        IF NOT EXISTS (
          SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger}'
        ) THEN
          EXECUTE $ddl$
            CREATE TRIGGER #{trigger}
            AFTER INSERT OR UPDATE OR DELETE ON public.#{table}
            FOR EACH ROW EXECUTE FUNCTION public.#{function}();
          $ddl$;
        END IF;
      END
      $do$;
    SQL

    ActiveRecord::Base.connection.execute(sql)
  end

  # Disable and drop mirroring trigger and function
  def self.disable_mirroring_triggers(table: 'events')
    base_table = table.sub(/_retired\z/, '')
    function_current = "#{table}_mirror_to_intermediate"
    function_base    = "#{base_table}_mirror_to_intermediate"
    trigger_current  = "#{table}_mirror_trigger"
    trigger_base     = "#{base_table}_mirror_trigger"

    sql = <<~SQL
      DO $$
      BEGIN
        -- Drop possible trigger names attached to this table
        IF EXISTS (SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger_current}') THEN
          EXECUTE 'DROP TRIGGER ' || quote_ident('#{trigger_current}') || ' ON public.#{table}';
        END IF;
        IF EXISTS (SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger_base}') THEN
          EXECUTE 'DROP TRIGGER ' || quote_ident('#{trigger_base}') || ' ON public.#{table}';
        END IF;

        -- Drop possible function names
        IF EXISTS (
          SELECT 1 FROM pg_proc p JOIN pg_namespace n ON n.oid = p.pronamespace
          WHERE p.proname = '#{function_current}' AND n.nspname = 'public'
        ) THEN
          EXECUTE 'DROP FUNCTION public.#{function_current}()';
        END IF;
        IF EXISTS (
          SELECT 1 FROM pg_proc p JOIN pg_namespace n ON n.oid = p.pronamespace
          WHERE p.proname = '#{function_base}' AND n.nspname = 'public'
        ) THEN
          EXECUTE 'DROP FUNCTION public.#{function_base}()';
        END IF;
      END$$;
    SQL

    ActiveRecord::Base.connection.execute(sql)
  end

  private

  def self.mirror_column_list(table)
    cols = columns_for(table)
    cols.map { |c| ActiveRecord::Base.connection.quote_column_name(c) }.join(', ')
  end

  def self.mirror_new_tuple_list(table)
    cols = columns_for(table)
    cols.map { |c| "NEW." + ActiveRecord::Base.connection.quote_column_name(c) }.join(', ')
  end

  def self.columns_for(table)
    sql = <<~SQL
      SELECT column_name
      FROM information_schema.columns
      WHERE table_schema = 'public' AND table_name = #{ActiveRecord::Base.connection.quote(table)}
      ORDER BY ordinal_position
    SQL
    ActiveRecord::Base.connection.exec_query(sql).rows.flatten
  end
end

The Migration Process

The key to safe partitioning is doing it in phases, verifying each step before moving to the next. Here's the recommended flow:

Prepare the partitioned table

First, create the intermediate table and initial partitions. For daily partitioning (most common for time-series data):

PgsliceRunner.prepare_partitioned_table(
  table: 'events',
  column: 'occurred_at'
)

This creates an events_intermediate table partitioned by day, with 7 days in the past and 7 days in the future by default. For monthly partitioning with more history:

PgsliceRunner.prepare_partitioned_table(
  table: 'events',
  column: 'occurred_at',
  period: 'month',
  past: 12,
  future: 3
)

At this point you basically have your normal table and an intermediate table that is a copy of the normal table with no data — yet!

Backfill existing data (optional - skip to dual sync if you're in a hurry)

Now populate the partitions with your existing data. You have two options:

Full backfill (recommended for smaller tables or when you can afford the time):

PgsliceRunner.run('fill', 'events')

Targeted backfill (for very large tables, backfill recent data first):

cutoff_id = [Event.maximum(:id).to_i - 100_000, 0].max
PgsliceRunner.run('fill', 'events', '--where', "id > #{cutoff_id}")

The targeted approach lets you get the migration started faster, then catch up on older data later. Important: Complete the main fill before enabling mirroring triggers.

Tip: If you used a --where for the initial fill, pass the same condition to the swapped fill (Step 4) to catch up the delta quickly.

Enable dual-writes (optional, recommended; used here)

To minimize data loss during the swap, enable triggers that mirror all writes to both tables. This will effectivelly write to your main DB and also the intermediate table. For Tatami, we basically did this for 7 days before the swap so it was basically real-time.

PgsliceRunner.enable_mirroring_triggers(table: 'events')

Let this run for a while (e.g., 30 minutes) while your app continues operating normally. The triggers ensure any writes during this window are captured in both tables.

Verify the mirroring is working:

PgsliceRunner.check_dual_write(table: 'events', sample_seconds: 60)

This returns a hash showing how many inserts happened in each table. The mirroring_observed key tells you if the triggers are working correctly.

Swap to the partitioned table

Once you're confident the data is in sync or that you have enough data in the intermediate table to perform the swap:

PgsliceRunner.swap_partitioned_table(table: 'events', backfill_where: nil)

If you used a --where clause for the initial fill, pass the same condition here to backfill any remaining delta:

cutoff_id = [Event.maximum(:id).to_i - 100_000, 0].max
PgsliceRunner.swap_partitioned_table(
  table: 'events',
  backfill_where: "id > #{cutoff_id}"
)

The swap is atomic — it renames your original table to events_retired and events_intermediate to events. After this point, all queries hit the partitioned table.

Abort path (safe rollback checklist):
Disable mirroring triggers:
PgsliceRunner.disable_mirroring_triggers(table: 'events')
If already swapped, unswap:
PgsliceRunner.run('unswap', 'events')
If you want to remove intermediate + partitions entirely:
PgsliceRunner.run('unprep', 'events')

Verify queries still work

Run some sanity checks to make sure everything behaves correctly:

PgsliceRunner.check_queries(model: 'Event', table: 'events', column: 'occurred_at')

This verifies that find and where queries work against the partitioned table. If anything looks off, you can still abort (see troubleshooting below).

Clean up (when ready)

After you've verified everything is working for a while, you can drop the old table:

DROP TABLE events_retired;

Don't rush this — keep it around for a few days or weeks as a safety net.

[IMPORTANT] Ongoing Maintenance

Partitioning isn't a one-time thing. You need to:

Add future partitions regularly so new data has somewhere to go
Drop old partitions to manage disk space and retention policies

Here's a background job that handles both. It uses pg_inherits to list true child partitions and supports day/month/year retention:

class PartitioningManagementJob < ApplicationJob
  queue_as :maintenance

  # Manage partitions for the specified period: ensure future partitions exist; drop expired ones.
  # period can be 'day', 'month', or 'year'.
  def perform(future_days: 15, retention_days: 7, table: 'events', period: 'day')
    future_count = future_days.to_i
    retention_count = retention_days.to_i

    # 1) Ensure future partitions exist on the ACTIVE table
    PgsliceRunner.add_future_partitions(table: table, period: period, future: future_count)

    # 2) Determine expired partitions based on the given period
    expired = expired_partitions(table: table, period: period, retention_count: retention_count)

    # 3) Drop only safe, expired partitions
    drop_old_partitions(parent_table: table, period: period, expired_tables: expired)
  end

  def expired_partitions(table: 'events', period: 'day', retention_count: 7)
    cutoff_str, suffix_len = case period
                             when 'day'
                               [(retention_count.days.ago).beginning_of_day.strftime('%Y%m%d'), 8]
                             when 'month'
                               [(retention_count.months.ago).beginning_of_month.strftime('%Y%m'), 6]
                             when 'year'
                               [(retention_count.years.ago).beginning_of_year.strftime('%Y'), 4]
                             else
                               raise ArgumentError, "Unsupported period: #{period}"
                             end

    escaped_table = Regexp.escape(table)
    pattern = "#{escaped_table}_([0-9]{#{suffix_len}})"

    sql = <<~SQL
      SELECT
        child.relname AS partition_name
      FROM
        pg_inherits
      JOIN
        pg_class AS parent ON pg_inherits.inhparent = parent.oid
      JOIN
        pg_class AS child ON pg_inherits.inhrelid = child.oid
      WHERE
        parent.relname = #{ActiveRecord::Base.connection.quote(table)} AND
        child.relname LIKE #{ActiveRecord::Base.connection.quote("#{table}_%" )} AND
        child.relname NOT LIKE #{ActiveRecord::Base.connection.quote("#{table}_intermediate%" )} AND
        SUBSTRING(child.relname FROM '#{pattern}') < #{ActiveRecord::Base.connection.quote(cutoff_str)}
    SQL

    ActiveRecord::Base.connection.execute(sql).map { |row| row['partition_name'] }
  end

  def drop_old_partitions(parent_table:, period: 'day', expired_tables: [])
    today_str, suffix_len = case period
                            when 'day'
                              [Time.zone.now.beginning_of_day.strftime('%Y%m%d'), 8]
                            when 'month'
                              [Time.zone.now.beginning_of_month.strftime('%Y%m'), 6]
                            when 'year'
                              [Time.zone.now.beginning_of_year.strftime('%Y'), 4]
                            else
                              raise ArgumentError, "Unsupported period: #{period}"
                            end

    expired_tables.each do |partition|
      # Guards: never drop the parent or intermediate tables
      next if partition == parent_table
      next if partition.start_with?("#{parent_table}_intermediate")

      match = partition.match(/\A#{Regexp.escape(parent_table)}_(\d{#{suffix_len}})\z/)
      date_part = match && match[1]
      next unless date_part
      next if date_part > today_str

      verify_sql = <<~SQL
        SELECT 1
        FROM pg_inherits
        JOIN pg_class AS parent ON pg_inherits.inhparent = parent.oid
        JOIN pg_class AS child ON pg_inherits.inhrelid = child.oid
        WHERE parent.relname = #{ActiveRecord::Base.connection.quote(parent_table)}
          AND child.relname = #{ActiveRecord::Base.connection.quote(partition)}
      SQL
      is_child = ActiveRecord::Base.connection.select_value(verify_sql).present?
      unless is_child
        Rails.logger.warn "Skipping non-child table #{partition} for parent #{parent_table}"
        next
      end

      drop_sql = "DROP TABLE IF EXISTS #{ActiveRecord::Base.connection.quote_table_name(partition)};"
      Rails.logger.info "Dropping old partition with SQL: #{drop_sql}"
      ActiveRecord::Base.connection.execute(drop_sql)
    end
  end
end

Schedule this job to run daily:

# config/schedule.rb (if using whenever)
every 1.day, at: '2:00 am' do
  runner 'PartitioningManagementJob.perform_later(future_days: 15, retention_days: 90, table: "events", period: "day")'
end

The job includes several safety features:

Uses pg_inherits: Instead of querying pg_tables, it uses Postgres's pg_inherits system catalog to find actual child partitions. This is more accurate and prevents accidentally dropping unrelated tables.
Pattern matching: It correctly handles different date formats for day (YYYYMMDD), month (YYYYMM), and year (YYYY) partitions.
Safety guards: Before dropping any partition, it verifies that the table is actually a child partition of the parent table. It also skips the parent table and intermediate tables entirely.

Troubleshooting

If the swap times out due to locks: Retry during a quieter window. The swap needs an exclusive lock on the table, so high-traffic periods can cause issues.

If you need to abort before the swap: Disable mirroring triggers and you're back to normal:

PgsliceRunner.disable_mirroring_triggers(table: 'events')

If you've already swapped but need to rollback: This is trickier, but pgslice provides an unswap command:

PgsliceRunner.run('unswap', 'events')

If you want to remove the intermediate table and partitions before swapping: Use unprep:

PgsliceRunner.run('unprep', 'events')

Sequence safety: The swap_partitioned_table method automatically syncs your sequence (e.g., events_id_seq) to be at least MAX(id) from the table, so you won't get ID collisions. This is handled by pgslice's swap command.

Best Practices

Choose your partition key wisely: Use a column that's frequently queried and evenly distributed. Time-based columns (like created_at or occurred_at) are ideal for time-series data.
Partition size matters: Aim for partitions with 1-10 million rows each. Too small and you'll have too many partitions; too large and you lose the performance benefits.
Monitor partition pruning: Postgres should automatically skip irrelevant partitions when you query with the partition key. Verify this with EXPLAIN:

EXPLAIN SELECT * FROM events WHERE occurred_at >= NOW() - INTERVAL '1 day';

Look for partition elimination in the query plan.

Index appropriately: Create indexes on each partition (or use a default template) rather than one giant index on the parent table.
Test locally first: Always test the full migration process on a copy of production data before running it live.

Conclusion

Partitioning can feel scary but with pgslice and some Ruby we can make it feel mostly innocent and easy.