Postgres autovacuum scale factor
Few notes on keep Postgres autovacuum behaving nicely as you grow.
Learned this snippet from @jsoyland
As your Postgres database grows it's worth keeping an eye on how often autovacuum is kicking in and the total amount of dead rows you're accumulating on a daily basis.
There are instances where if you don't monitor this you can end up in a situation where autovacuum just can't keep up with your growth and PG will start to become extremely sluggish (+ you'll be taking up space that you won't be able to claim back without a VACUUM FULL
and some downtime, that is the biggest issue for us at Bento).
If you're on Heroku, the default autovacuum settings are fine for the first TB of data or so especially if you're mostly just adding data and less so updating or deleting it. On Heroku, autovacuum kick in when dead tuples are greater than 20% of your total row count for a given table.
You can check this by downloading https://github.com/heroku/heroku-pg-extras and running heroku pg:autovacuum-stats
.
You'll notice that tables where you're updating or deleting often will be stacking up dead tuples waaaaaaay faster than others.
Not on Heroku? Try this (provided to me from a Heroku engineer):
SELECT psut.relname,
to_char(psut.last_vacuum, 'YYYY-MM-DD HH24:MI') as last_vacuum,
to_char(psut.last_autovacuum, 'YYYY-MM-DD HH24:MI') as last_autovacuum,
to_char(pg_class.reltuples, '9G999G999G999') AS n_tup,
to_char(psut.n_dead_tup, '999G999G999G999') AS dead_tup,
COALESCE((SELECT split_part(x, '=', 2) FROM unnest(reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ), current_setting('autovacuum_vacuum_scale_factor'))::float8 AS av_scale_factor,
to_char(CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (COALESCE((SELECT split_part(x, '=', 2) FROM unnest(reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ), current_setting('autovacuum_vacuum_scale_factor'))::float8 * pg_class.reltuples), '9G999G999G999') AS av_threshold,
CASE
WHEN CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (COALESCE((SELECT split_part(x, '=', 2) FROM unnest(reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ), current_setting('autovacuum_vacuum_scale_factor'))::float8 * pg_class.reltuples) < psut.n_dead_tup
THEN '*'
ELSE ''
END AS expect_av
FROM pg_stat_user_tables psut
JOIN pg_class
ON psut.relid = pg_class.oid
ORDER BY
CAST(current_setting('autovacuum_vacuum_threshold') AS bigint) + (COALESCE((SELECT split_part(x, '=', 2) FROM unnest(reloptions) q (x) WHERE x ~ '^autovacuum_vacuum_scale_factor=' ), current_setting('autovacuum_vacuum_scale_factor'))::float8 * pg_class.reltuples) DESC;
Once your rows start to creep into the tens of millions you may or may not want to configure autovacuum to kick in earlier than normal.
Luckily, it's a pretty easy thing to tweak.
Jump into your Postgres instance
heroku pg:psql
Alter your autovacuum scale factor setting
ALTER TABLE <tablename> SET autovacuum_vacuum_scale_factor = 0.015;
Run vacuum stats to check it's done!
heroku pg:vacuum-stats
And you're done!
Now, if you're like me you probably need some help calculating what percent to use for each row. As a general rule of thumb, I like the calculation of new dead tuples per day / total row count * 100
. So, for a table accumulating 1,000,000 dead tuples per day out of a total row count of 100M I would set the scale factor to 0.005
to 0.015
.
Make the change, monitor, and make sure it's cleaning up each day.
Subscribe to my personal updates
Get emails from me about building software, marketing, and things I've learned building products on the web. Occasionally, a quiet announcement or two.