Postgres VACUUM FULL without a lock

Below is a quick guide provided to me by Jesse on the Heroku database team for how to simulate a VACUUM FULL on tables without (much) downtime.

The primarily reason for wanting to do this is to reclaim space as a routine autovacuum will not claim anything back. For us at Bento, not being able to reclaim space is sort of a big problem because even if you delete 100M records you still won't be able to downgrade your plan or stop disk growth until a VACUUM FULL is done (which essentially just rebuilds a table then deletes the old one).

Everything below the line has been provided by Jesse on the Heroku team with minor edits by myself just to make things a little clearer — I wanted to share this just incase someone out there is stressfully hunting for a fix to a bloated databse but doesn't have the funds to just upgrade.

Please try this locally before you run it on a production database or, if you're on Heroku, for a production database and try it there instead.

Oh, and note, that in this example we're focused on a distributions table. This can be any table you want to clean up.

Using declarative partitioning on PG 11 and above.

# setup the partition master table and the new table
CREATE TABLE distributions_master (LIKE distributions, compacted boolean default true) partition by list (compacted);
CREATE TABLE distributions_new (LIKE distributions INCLUDING ALL, compacted boolean default true);
ALTER TABLE distributions_master ATTACH PARTITION distributions_new FOR VALUES IN (TRUE);

-- We need to add the partition key column to the existing table and then index it.
-- The index is needed because there will be lock held when making this a partition of
-- the master table while the values are checked to make sure they shouldn't actually
-- be in the already existing partition for values where compacted IS TRUE

ALTER TABLE distributions ADD COLUMN compacted boolean DEFAULT NULL;
CREATE INDEX CONCURRENTLY distributions_compacted_idx on distributions (compacted);

-- Now swap the table names and make the old table a partition of the master table

BEGIN;

ALTER TABLE distributions RENAME TO distributions_old;
ALTER TABLE distributions_master RENAME TO distributions;

-- This will hold a lock while the values in distribution_old are checked
-- But for a much shorter time than without the index or what you'd deal
-- with from a VACUUM FULL
ALTER TABLE distributions ATTACH PARTITION distributions_old DEFAULT;

COMMIT;

At this point:

All of your queries will hit the master partition, which will then scan both partitions.
New entries via INSERTs will automatically be routed to the new table/partition by way of the DEFAULT clause on compacted column.
DELETE queries are also safe, with one caveat that I'll cover further below.
After checking on the entry for distributions in pg_stat_user_tables for the last ~24 hours it doesn't appear that you run any UPDATE queries to need to worry about.

On PG11 or greater, the partitioned table will automatically move rows between partitions when the partition key changes. However, you definitely don't want to move all of them at once so you'll need to select them in chunks. Here I'll use a CTE to keep things clean and readable:

WITH to_move AS (
  SELECT d.dataset_version_id, d.subject_type, d.subject_id, d.bucket_type, d.bucket_value, d.question_id -- pkey columns
  FROM distributions_old
  ORDER BY dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id -- pkey columns
  LIMIT 1000 -- this limit determines how many rows each invocation can move
)
UPDATE distributions d
SET d.compacted = true
FROM to_move tm
WHERE (d.dataset_version_id, d.subject_type, d.subject_id, d.bucket_type, d.bucket_value, d.question_id) IN (SELECT * FROM to_move)

Caveat: If your app attempts to delete a row that is currently being moved via the above update query. Since the update started first the delete will queue behind it but then fail because the row is no longer where it was when it (the delete) started. Luckily, you will actually receive an error that you can watch for and re-run the delete if that happens, with that error being:

ERROR: tuple to be locked was already moved to another partition due to concurrent update

Once the old table is empty you can then clean up with the following:

ALTER TABLE distributions DETACH PARTITION distributions_compacted;
DROP TABLE distributions CASCADE;
ALTER TABLE distributions_compacted RENAME TO distributions;
ALTER TABLE distributions DROP COLUMN compacted;

Using declarative partitioning on PG10

If you are unable to upgrade your server before doing this you can still use the above process but will need a different query for moving the data from the old table/partition to the new:

WITH del AS (
  DELETE FROM distributions_old
  WHERE (dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id) IN
    (
      SELECT dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id -- pkey columns
      FROM distributions_old
      ORDER BY dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id -- pkey columns
      LIMIT 1000 -- this limit determines how many rows each invocation can move
    )
  RETURNING dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, bucket_source, question_id, answer_id_to_count -- all columns in table order
)
INSERT INTO distributions
SELECT *, true
FROM del;

The caveat there is that if you attempt to delete a row that is being moved you won't receive an error, it will simply return with 0 rows affected, which can be an issue with some ORMs that won't treat that as an error or exceptional case without you making changes to that code.

Using table inheritance on any version

Here you don't need the extra column for a partitioning key and only need one more table since with table inheritance you can insert directly into a child table's parent.

The setup:

BEGIN;
ALTER TABLE distributions RENAME TO distributions_old;
CREATE TABLE distributions (LIKE distributions_old INCLUDING ALL);
ALTER TABLE distributions_old INHERIT distributions;
ALTER SEQUENCE distributions_id_seq OWNED BY distributions.id;
COMMIT;

Moving the data from the old is pretty much the same as the above DELETE/INSERT query, you just don't have to account for the extra column:

WITH del AS (
  DELETE FROM distributions_old
  WHERE (dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id) IN
    (
      SELECT dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id -- pkey columns
      FROM distributions_old
      ORDER BY dataset_version_id, subject_type, subject_id, bucket_type, bucket_value, question_id -- pkey columns
      LIMIT 1000 -- this limit determines how many rows each invocation can move
    )
  RETURNING *
)
INSERT INTO distributions
SELECT *
FROM del;

And after the data is all moved the cleanup is simply:

DROP TABLE distributions_old;

This does have the same caveat of deleting rows that are already in the process of being moved will return without an error but 0 rows affected.

Conclusion

If you can get the upgrade done (which is worth it anyway) the first method is best since trying to delete a row that is in transit will return an error that you can catch. If you can't do that the table inheritance method is probably easiest since the setup and cleanup is much simpler.