Safely partitioning Postgres tables in Rails
A production-tested approach to partitioning large tables using pgslice (with a small amount of Ruby) with minimal downtime.
As your Postgres tables grow into the tens or hundreds of millions of rows, you'll start hitting issues whether it's performance in terms of query times or just raw storage costs.
For both Bento and Tatami we were storing a lot of data that we just simply didn't need to retain for more than a few days.
Running queries like Event.where('occurred_at >= ?', 1.day.ago).delete_all would take a long time, require a lot of memory, and because Postgres simply marks those rows as dead tuples we'd never reclaim space until we did a VACUUM FULL.
VACUUM FULL is a heavy operation, locks the table, and stops all writes to the table. I always want to avoid downtime especially if I'm the one causing it.
So what's the solution?
Table partitioning!
I spent some time reading up on the different partitioning strategies and tools available and settled on pgslice.
I wanted a Ruby specific way of handling this and wanted to avoid using extensions (pg_partman, pg_pathman, etc) which I personally felt to be a bit flaky and scary.
Prerequisites
pgslicegem installed in your production environmentPGSLICE_URLorDATABASE_URLset with proper credentials- DB role can create tables, functions, and triggers
- A maintenance window for the swap is recommended but not required
- pgslice flow at a glance:
prep → add_partitions → fill or dual sync → swap
When to partition
- Tables exceed ~10–50M rows and are still growing
- Your queries are time-range heavy (e.g., last N days/months)
- Count/analytics queries are slow and memory heavy
- You have a time-based retention policy and want cheap drops
The PgsliceRunner Service
The PgsliceRunner service wraps the pgslice CLI and adds some helpful Rails-specific methods for mirroring writes and running sanity checks. It's designed to be run from the Rails console, step by step, so you can verify each phase before moving to the next. pgslice infers the partition period from prep; specifying period elsewhere is for call-site clarity. Here's the code:
require 'open3'
require 'cgi'
class PgsliceRunner
def self.url
ENV['PGSLICE_URL'].presence || ENV['DATABASE_URL'].presence || begin
cfg = ActiveRecord::Base.connection_db_config.configuration_hash
user = CGI.escape(cfg[:username].to_s)
pass = CGI.escape(cfg[:password].to_s)
host = cfg[:host] || 'localhost'
port = (cfg[:port] || 5432).to_s
db = cfg[:database]
if user.blank?
"postgres://#{host}:#{port}/#{db}"
else
"postgres://#{user}:#{pass}@#{host}:#{port}/#{db}"
end
end
end
def self.run(*args)
cmd = ['bundle', 'exec', 'pgslice', *args, '--url', url]
stdout, stderr, status = Open3.capture3(*cmd)
Rails.logger.info("[PgsliceRunner] #{cmd.join(' ')}\n#{stdout}")
raise(StandardError, "pgslice failed: #{stderr.presence || stdout}") unless status.success?
stdout
end
# Prepare the partitioned table on an intermediate, creating past/future partitions
def self.prepare_partitioned_table(table: 'events', column: 'occurred_at', period: 'day', past: 7, future: 7)
run('prep', table, column, period)
args = ['add_partitions', table, '--intermediate']
args += ['--past', past.to_s] if past.to_i > 0
args += ['--future', future.to_s] if future.to_i > 0
run(*args)
true
end
# Add upcoming partitions on the active (post-swap) table or intermediate if requested
# period: 'day' or 'month'
# Note: pgslice infers period from table prep; exposing period for clarity in call sites
def self.add_future_partitions(table: 'events', period: 'day', future: 7, intermediate: false)
args = ['add_partitions', table, '--future', future.to_s]
args << '--intermediate' if intermediate
run(*args)
true
end
# Quick sanity checks that normal queries work against the current active table
def self.check_queries(model: 'Event', table: 'events', column: 'occurred_at')
m = model.constantize
latest_id = ActiveRecord::Base.connection.select_value("SELECT id FROM #{ActiveRecord::Base.connection.quote_table_name(table)} ORDER BY id DESC LIMIT 1").to_i
find_ok = false
where_ok = false
begin
m.find(latest_id) if latest_id.positive?
find_ok = latest_id.positive?
rescue StandardError
find_ok = false
end
begin
where_ok = m.where("#{column} >= ?", 1.day.ago).limit(1).exists?
rescue StandardError
where_ok = false
end
{ latest_id: latest_id, find_ok: find_ok, where_ok: where_ok }
end
# Observe new writes over a short window to confirm mirroring to <table>_intermediate
def self.check_dual_write(table: 'events', sample_seconds: 30)
orig = table
inter = "#{table}_intermediate"
q = ->(t) { ActiveRecord::Base.connection.select_value("SELECT COALESCE(MAX(id), 0) FROM #{ActiveRecord::Base.connection.quote_table_name(t)}").to_i }
before_orig = q.call(orig)
before_inter = q.call(inter)
sleep sample_seconds.to_i
after_orig = q.call(orig)
after_inter = q.call(inter)
inserted_orig = [after_orig - before_orig, 0].max
inserted_inter = ActiveRecord::Base.connection.select_value(
"SELECT COUNT(*) FROM #{ActiveRecord::Base.connection.quote_table_name(inter)} WHERE id > #{before_inter}"
).to_i
{
before_orig: before_orig,
after_orig: after_orig,
before_inter: before_inter,
after_inter: after_inter,
inserted_orig: inserted_orig,
inserted_inter: inserted_inter,
mirroring_observed: inserted_inter >= [inserted_orig, 1].max - 1
}
end
# Perform swap; optional backfill_where for scoped migrations; disables triggers and syncs sequence
# Note: ensure_sequence is handled by pgslice's swap command
def self.swap_partitioned_table(table: 'events', backfill_where: nil, disable_triggers: true, ensure_sequence: true, disable_fill: true)
run('analyze', table)
run('swap', table)
if backfill_where.present?
run('fill', table, '--swapped', '--where', backfill_where)
else
run('fill', table, '--swapped') unless disable_fill
end
if disable_triggers
disable_mirroring_triggers(table: "#{table}_retired") rescue nil
disable_mirroring_triggers(table: table) rescue nil
end
true
end
# Enable dual-write from original table to intermediate via triggers
def self.enable_mirroring_triggers(table: 'events')
intermediate = "#{table}_intermediate"
function = "#{table}_mirror_to_intermediate"
trigger = "#{table}_mirror_trigger"
sql = <<~SQL
DO $do$
BEGIN
-- create function if not exists
IF NOT EXISTS (
SELECT 1 FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE p.proname = '#{function}' AND n.nspname = 'public'
) THEN
EXECUTE $ddl$
CREATE OR REPLACE FUNCTION public.#{function}()
RETURNS trigger
LANGUAGE plpgsql
AS $fn$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO public.#{intermediate} SELECT (NEW).*;
ELSIF TG_OP = 'UPDATE' THEN
UPDATE public.#{intermediate} i SET
(#{mirror_column_list(table)}) = (#{mirror_new_tuple_list(table)})
WHERE i.id = NEW.id;
ELSIF TG_OP = 'DELETE' THEN
DELETE FROM public.#{intermediate} WHERE id = OLD.id;
END IF;
RETURN NULL;
END;
$fn$;
$ddl$;
END IF;
-- create trigger if not exists
IF NOT EXISTS (
SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger}'
) THEN
EXECUTE $ddl$
CREATE TRIGGER #{trigger}
AFTER INSERT OR UPDATE OR DELETE ON public.#{table}
FOR EACH ROW EXECUTE FUNCTION public.#{function}();
$ddl$;
END IF;
END
$do$;
SQL
ActiveRecord::Base.connection.execute(sql)
end
# Disable and drop mirroring trigger and function
def self.disable_mirroring_triggers(table: 'events')
base_table = table.sub(/_retired\z/, '')
function_current = "#{table}_mirror_to_intermediate"
function_base = "#{base_table}_mirror_to_intermediate"
trigger_current = "#{table}_mirror_trigger"
trigger_base = "#{base_table}_mirror_trigger"
sql = <<~SQL
DO $$
BEGIN
-- Drop possible trigger names attached to this table
IF EXISTS (SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger_current}') THEN
EXECUTE 'DROP TRIGGER ' || quote_ident('#{trigger_current}') || ' ON public.#{table}';
END IF;
IF EXISTS (SELECT 1 FROM pg_trigger WHERE tgname = '#{trigger_base}') THEN
EXECUTE 'DROP TRIGGER ' || quote_ident('#{trigger_base}') || ' ON public.#{table}';
END IF;
-- Drop possible function names
IF EXISTS (
SELECT 1 FROM pg_proc p JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE p.proname = '#{function_current}' AND n.nspname = 'public'
) THEN
EXECUTE 'DROP FUNCTION public.#{function_current}()';
END IF;
IF EXISTS (
SELECT 1 FROM pg_proc p JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE p.proname = '#{function_base}' AND n.nspname = 'public'
) THEN
EXECUTE 'DROP FUNCTION public.#{function_base}()';
END IF;
END$$;
SQL
ActiveRecord::Base.connection.execute(sql)
end
private
def self.mirror_column_list(table)
cols = columns_for(table)
cols.map { |c| ActiveRecord::Base.connection.quote_column_name(c) }.join(', ')
end
def self.mirror_new_tuple_list(table)
cols = columns_for(table)
cols.map { |c| "NEW." + ActiveRecord::Base.connection.quote_column_name(c) }.join(', ')
end
def self.columns_for(table)
sql = <<~SQL
SELECT column_name
FROM information_schema.columns
WHERE table_schema = 'public' AND table_name = #{ActiveRecord::Base.connection.quote(table)}
ORDER BY ordinal_position
SQL
ActiveRecord::Base.connection.exec_query(sql).rows.flatten
end
end
The Migration Process
The key to safe partitioning is doing it in phases, verifying each step before moving to the next. Here's the recommended flow:
Prepare the partitioned table
First, create the intermediate table and initial partitions. For daily partitioning (most common for time-series data):
PgsliceRunner.prepare_partitioned_table(
table: 'events',
column: 'occurred_at'
)
This creates an events_intermediate table partitioned by day, with 7 days in the past and 7 days in the future by default. For monthly partitioning with more history:
PgsliceRunner.prepare_partitioned_table(
table: 'events',
column: 'occurred_at',
period: 'month',
past: 12,
future: 3
)
At this point you basically have your normal table and an intermediate table that is a copy of the normal table with no data — yet!
Backfill existing data (optional - skip to dual sync if you're in a hurry)
Now populate the partitions with your existing data. You have two options:
Full backfill (recommended for smaller tables or when you can afford the time):
PgsliceRunner.run('fill', 'events')
Targeted backfill (for very large tables, backfill recent data first):
cutoff_id = [Event.maximum(:id).to_i - 100_000, 0].max
PgsliceRunner.run('fill', 'events', '--where', "id > #{cutoff_id}")
The targeted approach lets you get the migration started faster, then catch up on older data later. Important: Complete the main fill before enabling mirroring triggers.
Tip: If you used a --where for the initial fill, pass the same condition to the swapped fill (Step 4) to catch up the delta quickly.
Enable dual-writes (optional, recommended; used here)
To minimize data loss during the swap, enable triggers that mirror all writes to both tables. This will effectivelly write to your main DB and also the intermediate table. For Tatami, we basically did this for 7 days before the swap so it was basically real-time.
PgsliceRunner.enable_mirroring_triggers(table: 'events')
Let this run for a while (e.g., 30 minutes) while your app continues operating normally. The triggers ensure any writes during this window are captured in both tables.
Verify the mirroring is working:
PgsliceRunner.check_dual_write(table: 'events', sample_seconds: 60)
This returns a hash showing how many inserts happened in each table. The mirroring_observed key tells you if the triggers are working correctly.
Swap to the partitioned table
Once you're confident the data is in sync or that you have enough data in the intermediate table to perform the swap:
PgsliceRunner.swap_partitioned_table(table: 'events', backfill_where: nil)
If you used a --where clause for the initial fill, pass the same condition here to backfill any remaining delta:
cutoff_id = [Event.maximum(:id).to_i - 100_000, 0].max
PgsliceRunner.swap_partitioned_table(
table: 'events',
backfill_where: "id > #{cutoff_id}"
)
The swap is atomic — it renames your original table to events_retired and events_intermediate to events. After this point, all queries hit the partitioned table.
Abort path (safe rollback checklist):
- Disable mirroring triggers:
PgsliceRunner.disable_mirroring_triggers(table: 'events')- If already swapped, unswap:
PgsliceRunner.run('unswap', 'events')- If you want to remove intermediate + partitions entirely:
PgsliceRunner.run('unprep', 'events')
Verify queries still work
Run some sanity checks to make sure everything behaves correctly:
PgsliceRunner.check_queries(model: 'Event', table: 'events', column: 'occurred_at')
This verifies that find and where queries work against the partitioned table. If anything looks off, you can still abort (see troubleshooting below).
Clean up (when ready)
After you've verified everything is working for a while, you can drop the old table:
DROP TABLE events_retired;
Don't rush this — keep it around for a few days or weeks as a safety net.
[IMPORTANT] Ongoing Maintenance
Partitioning isn't a one-time thing. You need to:
- Add future partitions regularly so new data has somewhere to go
- Drop old partitions to manage disk space and retention policies
Here's a background job that handles both. It uses pg_inherits to list true child partitions and supports day/month/year retention:
class PartitioningManagementJob < ApplicationJob
queue_as :maintenance
# Manage partitions for the specified period: ensure future partitions exist; drop expired ones.
# period can be 'day', 'month', or 'year'.
def perform(future_days: 15, retention_days: 7, table: 'events', period: 'day')
future_count = future_days.to_i
retention_count = retention_days.to_i
# 1) Ensure future partitions exist on the ACTIVE table
PgsliceRunner.add_future_partitions(table: table, period: period, future: future_count)
# 2) Determine expired partitions based on the given period
expired = expired_partitions(table: table, period: period, retention_count: retention_count)
# 3) Drop only safe, expired partitions
drop_old_partitions(parent_table: table, period: period, expired_tables: expired)
end
def expired_partitions(table: 'events', period: 'day', retention_count: 7)
cutoff_str, suffix_len = case period
when 'day'
[(retention_count.days.ago).beginning_of_day.strftime('%Y%m%d'), 8]
when 'month'
[(retention_count.months.ago).beginning_of_month.strftime('%Y%m'), 6]
when 'year'
[(retention_count.years.ago).beginning_of_year.strftime('%Y'), 4]
else
raise ArgumentError, "Unsupported period: #{period}"
end
escaped_table = Regexp.escape(table)
pattern = "#{escaped_table}_([0-9]{#{suffix_len}})"
sql = <<~SQL
SELECT
child.relname AS partition_name
FROM
pg_inherits
JOIN
pg_class AS parent ON pg_inherits.inhparent = parent.oid
JOIN
pg_class AS child ON pg_inherits.inhrelid = child.oid
WHERE
parent.relname = #{ActiveRecord::Base.connection.quote(table)} AND
child.relname LIKE #{ActiveRecord::Base.connection.quote("#{table}_%" )} AND
child.relname NOT LIKE #{ActiveRecord::Base.connection.quote("#{table}_intermediate%" )} AND
SUBSTRING(child.relname FROM '#{pattern}') < #{ActiveRecord::Base.connection.quote(cutoff_str)}
SQL
ActiveRecord::Base.connection.execute(sql).map { |row| row['partition_name'] }
end
def drop_old_partitions(parent_table:, period: 'day', expired_tables: [])
today_str, suffix_len = case period
when 'day'
[Time.zone.now.beginning_of_day.strftime('%Y%m%d'), 8]
when 'month'
[Time.zone.now.beginning_of_month.strftime('%Y%m'), 6]
when 'year'
[Time.zone.now.beginning_of_year.strftime('%Y'), 4]
else
raise ArgumentError, "Unsupported period: #{period}"
end
expired_tables.each do |partition|
# Guards: never drop the parent or intermediate tables
next if partition == parent_table
next if partition.start_with?("#{parent_table}_intermediate")
match = partition.match(/\A#{Regexp.escape(parent_table)}_(\d{#{suffix_len}})\z/)
date_part = match && match[1]
next unless date_part
next if date_part > today_str
verify_sql = <<~SQL
SELECT 1
FROM pg_inherits
JOIN pg_class AS parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class AS child ON pg_inherits.inhrelid = child.oid
WHERE parent.relname = #{ActiveRecord::Base.connection.quote(parent_table)}
AND child.relname = #{ActiveRecord::Base.connection.quote(partition)}
SQL
is_child = ActiveRecord::Base.connection.select_value(verify_sql).present?
unless is_child
Rails.logger.warn "Skipping non-child table #{partition} for parent #{parent_table}"
next
end
drop_sql = "DROP TABLE IF EXISTS #{ActiveRecord::Base.connection.quote_table_name(partition)};"
Rails.logger.info "Dropping old partition with SQL: #{drop_sql}"
ActiveRecord::Base.connection.execute(drop_sql)
end
end
end
Schedule this job to run daily:
# config/schedule.rb (if using whenever)
every 1.day, at: '2:00 am' do
runner 'PartitioningManagementJob.perform_later(future_days: 15, retention_days: 90, table: "events", period: "day")'
end
The job includes several safety features:
-
Uses
pg_inherits: Instead of queryingpg_tables, it uses Postgres'spg_inheritssystem catalog to find actual child partitions. This is more accurate and prevents accidentally dropping unrelated tables. -
Pattern matching: It correctly handles different date formats for day (
YYYYMMDD), month (YYYYMM), and year (YYYY) partitions. -
Safety guards: Before dropping any partition, it verifies that the table is actually a child partition of the parent table. It also skips the parent table and intermediate tables entirely.
Troubleshooting
If the swap times out due to locks: Retry during a quieter window. The swap needs an exclusive lock on the table, so high-traffic periods can cause issues.
If you need to abort before the swap: Disable mirroring triggers and you're back to normal:
PgsliceRunner.disable_mirroring_triggers(table: 'events')
If you've already swapped but need to rollback: This is trickier, but pgslice provides an unswap command:
PgsliceRunner.run('unswap', 'events')
If you want to remove the intermediate table and partitions before swapping: Use unprep:
PgsliceRunner.run('unprep', 'events')
Sequence safety: The swap_partitioned_table method automatically syncs your sequence (e.g., events_id_seq) to be at least MAX(id) from the table, so you won't get ID collisions. This is handled by pgslice's swap command.
Best Practices
-
Choose your partition key wisely: Use a column that's frequently queried and evenly distributed. Time-based columns (like
created_atoroccurred_at) are ideal for time-series data. -
Partition size matters: Aim for partitions with 1-10 million rows each. Too small and you'll have too many partitions; too large and you lose the performance benefits.
-
Monitor partition pruning: Postgres should automatically skip irrelevant partitions when you query with the partition key. Verify this with
EXPLAIN:
EXPLAIN SELECT * FROM events WHERE occurred_at >= NOW() - INTERVAL '1 day';
Look for partition elimination in the query plan.
-
Index appropriately: Create indexes on each partition (or use a default template) rather than one giant index on the parent table.
-
Test locally first: Always test the full migration process on a copy of production data before running it live.
Conclusion
Partitioning can feel scary but with pgslice and some Ruby we can make it feel mostly innocent and easy.
Subscribe to my personal updates
Get emails from me about building software, marketing, and things I've learned building products on the web. Occasionally, a quiet announcement or two.
Bento