May 26, 2025·7 min read

Zero-downtime index changes in PostgreSQL: a safe playbook

Zero-downtime index changes in PostgreSQL using CONCURRENTLY, simple lock checks, and clear rollback steps to keep production traffic flowing.

Zero-downtime index changes in PostgreSQL: a safe playbook

Why index changes cause downtime (and how to avoid it)

Index work sounds harmless. You are "just" adding a helper structure. In PostgreSQL, though, building, dropping, or swapping an index can take locks that block other sessions. If the table is busy, those waits stack up and the app starts to feel broken.

Downtime rarely looks like a clean outage banner. It often shows up as pages that hang, background jobs that fall behind, and a growing queue of requests stuck waiting on the database. Someone hits "Search" and gets a timeout, while support tools and admin screens suddenly feel sluggish because simple queries cannot get the lock they need.

"Just run it at night" fails for two common reasons. Many systems are never truly quiet (global users, batch jobs, ETL, backups). And index operations can take longer than you expect because they read a lot of data and compete for CPU and disk. If the window closes mid-build, you are stuck choosing between waiting longer or interrupting the work.

Zero-downtime index changes are not magic. They come down to picking the least-blocking operation, setting guardrails (timeouts and disk checks), and watching the database while it runs.

This playbook focuses on practical production habits:

  • Prefer concurrent index builds when reads and writes must keep flowing.
  • Monitor locks and build progress so you can react early.
  • Have a rollback path if the change causes regressions or takes too long.

What it does not cover: deep index design theory, broad query tuning, or schema refactors that rewrite lots of data.

The simple lock model behind index work

PostgreSQL uses locks to keep data correct when many sessions touch the same table. A lock is just a rule that says who is allowed to read or write an object right now, and who must wait.

Most of the time you never notice locks because PostgreSQL can use lightweight modes that allow normal queries to run. DDL is different. When you create or drop an index, PostgreSQL needs enough control over the table to keep the catalog and data consistent. The more control it needs, the more other sessions may be forced to wait.

Building an index vs using an index

Using an index is usually cheap from a locking point of view. SELECT, UPDATE, and DELETE queries can read or maintain indexes while other sessions do the same.

Building an index is different. PostgreSQL must scan the table, sort or hash keys, and write a new structure to disk. That work takes time, and time is what turns "small locks" into "big problems" in production.

What CONCURRENTLY changes (and what it does not)

A normal CREATE INDEX takes a strong lock that blocks writes for the duration. CREATE INDEX CONCURRENTLY is designed to keep normal reads and writes moving while the index is being built.

But "concurrent" does not mean "lock-free." You still get short lock windows at the start and end, and the build can fail or wait if something else holds incompatible locks.

The outcomes that matter most:

  • Non-concurrent builds can block inserts, updates, and deletes on the table.
  • Concurrent builds usually allow reads and writes, but can be slowed or stalled by long transactions.
  • The finishing steps still need brief locks, so very busy systems can see short waits.

Pick the right approach: concurrent or not

You have two main options when changing indexes: build the index normally (fast, but blocking), or build it with CONCURRENTLY (usually non-blocking for application traffic, but slower and more sensitive to long transactions).

When CONCURRENTLY is the right choice

Use CREATE INDEX CONCURRENTLY when the table serves real traffic and you cannot pause writes. It is typically the safer choice when:

  • The table is large enough that a normal build could take minutes or hours.
  • The table has steady writes, not just reads.
  • You cannot schedule a real maintenance window.
  • You need to build first, verify, then drop an old index later.

When a normal index build is acceptable

A normal CREATE INDEX can be fine when the table is small, traffic is low, or you have a controlled window. It often finishes faster and is simpler to run.

Consider the normal approach if the build is consistently quick in staging and you can temporarily stop writes (even briefly).

If you need uniqueness, decide early. CREATE UNIQUE INDEX CONCURRENTLY works, but it will fail if duplicate values exist. In many production systems, finding and fixing duplicates is the real project.

Preflight checks before touching production

Most problems happen before the command even starts. A few checks help you avoid two big surprises: unexpected blocking and an index build that runs far longer (or uses more space) than you planned.

1) Make sure you are not inside a transaction. CREATE INDEX CONCURRENTLY will fail if you run it after BEGIN, and some GUI tools quietly wrap statements in a transaction. If you are unsure, open a fresh session and run only the index command there.

2) Set expectations for time and disk. Concurrent builds usually take longer than normal builds and need extra working space while they run. Plan for the new index plus temporary overhead, and confirm you have comfortable free disk headroom.

3) Set timeouts that match your goal. You want the build to fail fast if it cannot get a lock, but you do not want the session to die mid-build because of an aggressive statement timeout.

4) Capture a baseline. You want proof the change helped and a quick way to spot regressions. Record a before snapshot: the slow query timing, a representative EXPLAIN (ANALYZE, BUFFERS), and a quick view of CPU, IO, connections, and free disk.

Safe session settings many teams use as a starting point (adjust to your rules):

-- Run in the same session that will build the index
SET lock_timeout = '2s';
SET statement_timeout = '0';

Step-by-step: create an index with CONCURRENTLY

Build faster admin panels
Create an admin panel that stays responsive even when your schema evolves.
Start Building

Use CREATE INDEX CONCURRENTLY when you need application traffic to keep moving and you can afford a longer build time.

First, decide exactly what you are building:

  • Be specific about column order (it matters).
  • Consider whether a partial index is enough. If most queries filter to "active" rows, a partial index can be smaller, faster, and cheaper to maintain.

A safe run looks like this: write down the goal and index name, run the build outside any transaction block, watch it until it completes, then verify the planner can use it before you remove anything else.

-- Example: speed up searches by email for active users
CREATE INDEX CONCURRENTLY idx_users_active_email
ON public.users (email)
WHERE status = 'active';

-- Validate it exists
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'users';

-- Check the plan can use it
EXPLAIN (ANALYZE, BUFFERS)
SELECT id
FROM public.users
WHERE status = 'active' AND email = '[email protected]';

For progress notes (useful for audits), record the start time, end time, and any waits you saw. While it runs, you can query pg_stat_progress_create_index from another session.

Validation is not just "the index exists." Confirm the plan can choose it, then watch real query timing after deployment. If the new index is not used, do not rush to drop the old one. Fix the query or the index definition first.

Step-by-step: replace or remove indexes without blocking

The safest pattern is to add first, let traffic benefit from the new index, and only then remove the old one. That way, you keep a working fallback.

Swap an old index for a new one (the safe order)

  1. Create the new index with CREATE INDEX CONCURRENTLY.

  2. Verify it is being used. Check EXPLAIN on the slow queries you care about, and watch index usage over time.

  3. Only after that, drop the old index concurrently. If risk is high, keep both indexes for a full business cycle before removing anything.

Dropping indexes: when CONCURRENTLY works (and when it does not)

For a normal index you created yourself, DROP INDEX CONCURRENTLY is usually the right choice. Two gotchas: it cannot run inside a transaction block, and it still needs brief locks at the start and end, so it can be delayed by long-running transactions.

If the index exists because of a PRIMARY KEY or UNIQUE constraint, you usually cannot drop it directly. You must change the constraint with ALTER TABLE, which can take stronger locks. Treat that as a separate, planned maintenance operation.

Renaming indexes for clarity

Renaming (ALTER INDEX ... RENAME TO ...) is usually quick, but avoid it if tooling or migrations refer to index names. A safer habit is to pick a clear name from the start.

If the old index is still needed

Sometimes two query patterns need two different indexes. If important queries still rely on the old one, keep it. Consider adjusting the new index (column order, partial condition) rather than forcing a drop.

Monitor locks and progress while the index builds

Generate a real backend
Model data in a visual designer and generate a production-ready Go backend from it.
Create Backend

Even with CREATE INDEX CONCURRENTLY, you should watch what is happening in real time. Most surprise incidents come from one of two things: a blocking session you did not notice, or a long-running transaction that keeps the build stuck waiting.

Spot blocking sessions (who is blocking whom)

Start by finding sessions waiting on locks:

SELECT
  a.pid,
  a.usename,
  a.application_name,
  a.state,
  a.wait_event_type,
  a.wait_event,
  now() - a.xact_start AS xact_age,
  left(a.query, 120) AS query
FROM pg_stat_activity a
WHERE a.wait_event_type = 'Lock'
ORDER BY xact_age DESC;

If you need the exact blocker, follow blocked_pid to blocking_pid:

SELECT
  blocked.pid  AS blocked_pid,
  blocking.pid AS blocking_pid,
  now() - blocked.xact_start AS blocked_xact_age,
  left(blocked.query, 80)  AS blocked_query,
  left(blocking.query, 80) AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
  ON blocking.pid = ANY (pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';

Watch build progress and "stuck" signals

PostgreSQL exposes index build progress. If you see no movement for a long time, look for a long transaction (often an idle session holding an old snapshot).

SELECT
  pid,
  phase,
  lockers_total,
  lockers_done,
  blocks_total,
  blocks_done,
  tuples_total,
  tuples_done
FROM pg_stat_progress_create_index;

Also keep an eye on system pressure: disk IO, replication lag, and rising query times. Concurrent builds are friendlier to uptime, but they still read a lot of data.

Simple rules that work well in production:

  • Wait if progress is moving and user impact is low.
  • Cancel and reschedule if the build is stuck behind a long transaction you cannot end safely.
  • Pause during peak traffic if IO is hurting customer-facing queries.
  • Terminate only as a last resort, and only after confirming what the session is doing.

For team communication, keep updates short: start time, current phase, what is blocked (if anything), and when you will check again.

Rollback plan: how to back out safely

Turn DB habits into releases
Build internal tools with PostgreSQL and ship changes with a clear rollout and rollback plan.
Try AppMaster

Index changes only stay low-risk if you plan the exit before you start. The safest rollback is often not a dramatic undo. It is simply stopping the new work and keeping the old index in place.

Common ways index work fails

Most production failures are predictable: the build hits a timeout, someone cancels it during an incident, the server runs low on disk, or the build competes with normal traffic enough that user-facing latency jumps.

With CREATE INDEX CONCURRENTLY, canceling is usually safe for the app because queries keep running. The tradeoff is cleanup: a canceled or failed concurrent build can leave an invalid index behind.

Safe cancel and cleanup rules

Canceling a concurrent build does not roll back like a normal transaction. PostgreSQL may leave an index that exists but is not valid for the planner.

-- Cancel the session building the index (use the PID you identified)
SELECT pg_cancel_backend(\u003cpid\u003e);

-- If the index exists but is invalid, remove it without blocking writes
DROP INDEX CONCURRENTLY IF EXISTS your_index_name;

Before dropping, confirm what you are looking at:

SELECT
  c.relname AS index_name,
  i.indisvalid,
  i.indisready
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
WHERE c.relname = 'your_index_name';

If indisvalid = false, it is not being used and is safe to drop.

Practical rollback checklist when replacing an existing index:

  • Keep the old index until the new one is fully built and valid.
  • If the new build fails or is canceled, drop the invalid new index concurrently.
  • If you already dropped the old index, recreate it with CREATE INDEX CONCURRENTLY to restore the previous state.
  • If disk pressure caused the failure, free space first, then retry.
  • If timeouts caused the failure, schedule a quieter window rather than forcing it.

Example: you start a new index for an admin search, it runs for 20 minutes, then disk alerts fire. Cancel the build, drop the invalid index concurrently, and keep the old index serving traffic. You can try again after freeing space, without a user-visible outage.

Common mistakes that create surprise outages

Most outages around indexes are not caused by PostgreSQL being "slow." They happen because one small detail turns a safe change into a blocking one.

1) Putting a concurrent build inside a transaction

CREATE INDEX CONCURRENTLY cannot run inside a transaction block. Many migration tools wrap every change in a single transaction by default. The result is either a hard error (best case) or a messy deploy with retries.

Before you run the migration, confirm your tool can run a statement without an outer transaction, or split the migration into a special non-transactional step.

2) Kicking it off during peak traffic

Concurrent index builds reduce blocking, but they still add load: extra reads, extra writes, and more pressure on autovacuum. Starting the build during a deploy window when traffic is spiking is a common way to create a slowdown that feels like an outage.

Pick a quiet period and treat it like any other production maintenance.

3) Ignoring long-running transactions

A single long transaction can hold back the cleanup phase of a concurrent build. The index may look like it is progressing, then sit near the end while waiting for old snapshots to disappear.

Build the habit: check for long-running transactions before you start, and again if progress stalls.

4) Dropping the wrong thing (or breaking a constraint)

Teams sometimes drop an index by name from memory, or remove an index that backs a uniqueness rule. If you drop the wrong object, you can lose enforcement (unique constraints) or regress query performance instantly.

Quick safety checklist: verify the index name in the catalog, confirm whether it supports a constraint, double-check the schema and table, and keep "create new" separate from "drop old." Have the rollback command ready before you start.

Automate operations safely
Replace fragile scripts with a clear app and process automation your team can maintain.
Start Free

A common pain point is an admin search that feels instant in staging but crawls in production. Say you have a large tickets table (tens of millions of rows) behind an internal admin panel, and agents often search "open tickets for one customer, newest first."

The query looks like this:

SELECT id, customer_id, subject, created_at
FROM tickets
WHERE customer_id = $1
  AND status = 'open'
ORDER BY created_at DESC
LIMIT 50;

A full index on (customer_id, status, created_at) helps, but it adds write overhead for every ticket update, including closed ones. If most rows are not open, a partial index is often a simpler win:

CREATE INDEX CONCURRENTLY tickets_open_by_customer_created_idx
ON tickets (customer_id, created_at DESC)
WHERE status = 'open';

A safe timeline in production:

  • Preflight: confirm the query shape is stable and the table has enough free disk for a new index build.
  • Build: run CREATE INDEX CONCURRENTLY in a separate session with clear timeout settings.
  • Validate: run ANALYZE tickets; and confirm the planner uses the new index.
  • Cleanup: once you are confident, drop any now-redundant older index with DROP INDEX CONCURRENTLY.

What success looks like:

  • The admin search drops from seconds to tens of milliseconds for typical customers.
  • Regular reads and writes keep working during the build.
  • CPU and disk IO rise during the build but stay within your normal safety limits.
  • You can point to clear before/after numbers: query time, rows scanned, and lock history.

Quick checklist and next steps

Index work is safest when you treat it like a small production release: prepare, watch it while it runs, then verify the result before you clean up.

Before you start:

  • Set timeouts so a surprise lock does not hang forever.
  • Confirm enough free disk space for the new index build.
  • Look for long-running transactions that could slow the build.
  • Pick a low-traffic window and define what "done" looks like.
  • Write down your rollback plan now.

While it runs:

  • Watch for blocking and lock wait chains.
  • Track build progress with pg_stat_progress_create_index.
  • Keep an eye on app symptoms: error rate, timeouts, and slow endpoints tied to the table.
  • Be ready to cancel if lock waits rise or user-facing timeouts spike.
  • Log what happened: start time, finish time, and any alerts.

After it finishes, confirm the index is valid, run the one or two key queries to see plan and timing improve, and only then remove old indexes in a non-blocking way.

If you do this more than once, turn it into a repeatable delivery step: a small runbook, a staging rehearsal with production-like data, and a clear owner watching the build.

If you're building internal tools or admin panels with AppMaster (appmaster.io), it helps to treat database changes like index builds as part of the same release checklist as your backend updates: measured, monitored, and with a rollback you can execute quickly.

FAQ

Why can adding or changing an index cause downtime?

Downtime usually shows up as lock waits, not a full outage. A normal CREATE INDEX can block writes for the whole build, so requests that need to insert, update, or delete start waiting and then timing out, which makes pages hang and queues back up.

When should I use CREATE INDEX CONCURRENTLY instead of a normal CREATE INDEX?

Use CREATE INDEX CONCURRENTLY when the table has real traffic and you can’t pause writes. It’s the safer default for large or busy tables, even though it runs slower and can be delayed by long transactions.

Does CONCURRENTLY mean “no locks at all”?

No. It reduces blocking, but it’s not lock-free. You still get short lock windows at the beginning and end, and the build can wait if other sessions hold incompatible locks or if long-running transactions prevent the final steps from completing.

Why does “just run it at night” often fail?

Because production often isn’t quiet, and index builds can take much longer than expected due to table size, CPU, and disk IO. If the build runs past your window, you end up choosing between extending risk during business hours or canceling mid-change.

What should I check before running a concurrent index build in production?

First, make sure you’re not in a transaction, because CREATE INDEX CONCURRENTLY will fail inside one. Next, confirm you have enough free disk for the new index plus temporary overhead, and set a short lock_timeout so you fail fast if you can’t get needed locks.

What timeouts should I set for safe index changes?

A common starting point is SET lock_timeout = '2s'; and SET statement_timeout = '0'; in the same session that will build the index. This helps you avoid waiting forever on locks without killing the build halfway through due to an aggressive statement timeout.

How do I tell if a concurrent index build is stuck, and what do I look at first?

Start with pg_stat_progress_create_index to see the phase and whether blocks and tuples are advancing. If progress stalls, check pg_stat_activity for lock waits and look for long-running transactions, especially idle sessions holding old snapshots.

What’s the safest way to replace an existing index without blocking traffic?

Create the new index concurrently, verify the planner can use it (and that real query timing improves), and only then drop the old index concurrently. This “add first, remove later” order keeps a working fallback if the new index isn’t used or causes regressions.

Can I always drop an index concurrently?

DROP INDEX CONCURRENTLY is usually safe for regular indexes, but it still needs brief locks and cannot run inside a transaction block. If the index backs a PRIMARY KEY or UNIQUE constraint, you typically must change the constraint with ALTER TABLE, which can require stronger locks and more planning.

How do I roll back safely if a concurrent index build fails or gets canceled?

Cancel the build session, then check whether an invalid index was left behind. If indisvalid is false, drop it with DROP INDEX CONCURRENTLY and keep the old index in place; if you already dropped the old one, recreate it concurrently to restore the previous behavior.

Easy to start
Create something amazing

Experiment with AppMaster with free plan.
When you will be ready you can choose the proper subscription.

Get Started