Outbox pattern in PostgreSQL for reliable API integrations
Learn the outbox pattern to store events in PostgreSQL, then deliver them to third-party APIs with retries, ordering, and deduplication.

Why integrations fail even when your app works
It’s common to see a “successful” action in your app while the integration behind it quietly fails. Your database write is fast and reliable. A call to a third-party API isn’t. That creates two different worlds: your system says the change happened, but the external system never heard about it.
A typical example: a customer places an order, your app saves it in PostgreSQL, and then tries to notify a shipping provider. If the provider times out for 20 seconds and your request gives up, the order is still real, but the shipment is never created.
Users experience this as confusing, inconsistent behavior. Missing events look like “nothing happened.” Duplicate events look like “why did I get charged twice?” Support teams also struggle because it’s hard to tell whether the issue was your app, the network, or the partner.
Retries help, but retries alone don’t guarantee correctness. If you retry after a timeout, you might send the same event twice because you don’t know whether the partner received the first request. If you retry out of order, you might send “Order shipped” before “Order paid.”
These problems usually come from normal concurrency: multiple workers processing in parallel, multiple app servers writing at the same time, and “best effort” queues where timing changes under load. The failure modes are predictable: APIs go down or slow, networks drop requests, processes crash at the wrong moment, and retries create duplicates when nothing enforces idempotency.
The outbox pattern exists because these failures are normal.
What the outbox pattern is in plain terms
The outbox pattern is straightforward: when your app makes an important change (like creating an order), it also writes a small “event to send” record into a database table, in the same transaction. If the database commit succeeds, you know the business data and the event record exist together.
After that, a separate worker reads the outbox table and delivers those events to third-party APIs. If an API is slow, down, or times out, your main user request still succeeds because it isn’t waiting on the external call.
This avoids the awkward states you get when you call an API inside the request handler:
- The order is saved, but the API call fails.
- The API call succeeds, but your app crashes before saving the order.
- The user retries, and you send the same thing twice.
The outbox pattern mainly helps with lost events, partial failures (database ok, external API not ok), accidental double sends, and safer retries (you can try again later without guessing).
It doesn’t fix everything. If your payload is wrong, your business rules are wrong, or the third-party API rejects the data, you still need validation, good error handling, and a way to inspect and correct failed events.
Designing an outbox table in PostgreSQL
A good outbox table is boring on purpose. It should be easy to write to, easy to read from, and hard to misuse.
Here is a practical baseline schema you can adapt:
create table outbox_events (
id bigserial primary key,
aggregate_id text not null,
event_type text not null,
payload jsonb not null,
status text not null default 'pending',
created_at timestamptz not null default now(),
available_at timestamptz not null default now(),
attempts int not null default 0,
locked_at timestamptz,
locked_by text,
meta jsonb not null default '{}'::jsonb
);
Choosing an ID
Using bigserial (or bigint) keeps ordering simple and indexes fast. UUIDs are great for uniqueness across systems, but they don’t sort in creation order, which can make polling less predictable and indexes heavier.
A common compromise is: keep id as bigint for ordering, and add a separate event_uuid if you need a stable identifier to share across services.
Indexes that matter
Your worker will query the same patterns all day. Most systems need:
- An index like
(status, available_at, id)to fetch the next pending events in order. - An index on
(locked_at)if you plan to expire stale locks. - An index like
(aggregate_id, id)if you sometimes deliver per aggregate in order.
Keep payloads stable
Keep payloads small and predictable. Store what the receiver actually needs, not your entire row. Add an explicit version (for example, in meta) so you can evolve fields safely.
Use meta for routing and debugging context like tenant ID, correlation ID, trace ID, and a dedup key. That extra context pays off later when support needs to answer “what happened to this one order?”
How to store events safely with your business write
The most important rule is simple: write business data and the outbox event in the same database transaction. If the transaction commits, both exist. If it rolls back, neither exists.
Example: a customer places an order. In one transaction you insert the order row, the order items, and one outbox row like order.created. If any step fails, you don’t want a “created” event escaping into the world.
One event or many?
Start with one event per business action when you can. It’s easier to reason about and cheaper to process. Split into multiple events only when different consumers truly need different timing or payloads (for example, order.created for fulfillment and payment.requested for billing). Generating many events for one click increases retries, ordering headaches, and duplicate handling.
What payload should you store?
You usually choose between:
- Snapshot: store key fields as they were at the time of the action (order total, currency, customer ID). This avoids extra reads later and keeps the message stable.
- Reference ID: store only the order ID and let the worker load details later. This keeps the outbox small, but adds reads and can change if the order is edited.
A practical middle ground is identifiers plus a small snapshot of critical values. It helps receivers act quickly and helps you debug.
Keep the transaction boundary tight. Don’t call third-party APIs inside the same transaction.
Delivering events to third-party APIs: the worker loop
Once events are in your outbox, you need a worker that reads them and calls the third-party API. This is the part that turns the pattern into a reliable integration.
Polling is usually the simplest option. LISTEN/NOTIFY can reduce latency, but it adds moving parts and still needs a fallback when notifications are missed or the worker restarts. For most teams, steady polling with a small batch is easier to run and debug.
Claiming rows safely
The worker should claim rows so two workers never process the same event at the same time. In PostgreSQL, the common approach is to select a batch using row locks and SKIP LOCKED, then mark them as in progress.
A practical status flow is:
pending: ready to sendprocessing: locked by a worker (uselocked_byandlocked_at)sent: delivered successfullyfailed: stopped after max attempts (or moved aside for manual review)
Keep batches small to be kind to your database. A batch of 10 to 100 rows, running every 1 to 5 seconds, is a common starting point.
When a call succeeds, mark the row sent. When it fails, increment attempts, set available_at to a future time (backoff), clear the lock, and return it to pending.
Logging that helps (without leaking secrets)
Good logs make failures actionable. Log the outbox id, event type, destination name, attempt count, timing, and HTTP status or error class. Avoid request bodies, auth headers, and full responses. If you need correlation, store a safe request ID or a hash instead of raw payload data.
Ordering rules that work in real systems
Many teams start with “send events in the same order we created them.” The catch is that “the same order” is rarely global. If you force one global queue, a single slow customer or flaky API can hold up everyone else.
A practical rule is: preserve order per group, not for the whole system. Pick a grouping key that matches how the outside world thinks about your data, such as customer_id, account_id, or an aggregate_id like order_id. Then guarantee ordering inside each group while delivering many groups in parallel.
Parallel workers without breaking order
Run multiple workers, but ensure two workers don’t process the same group at the same time. The usual approach is to always deliver the earliest unsent event for a given aggregate_id and allow parallelism across different aggregates.
Keep the claiming rules simple:
- Only deliver the earliest pending event per group.
- Allow parallelism across groups, not within a group.
- Claim one event, send it, update status, then move on.
When one event blocks the rest
Sooner or later, one “poison” event will fail for hours (bad payload, revoked token, provider outage). If you strictly enforce per-group order, later events in that group should wait, but other groups should continue.
A workable compromise is to cap retries per event. After that, mark it failed and pause only that group until someone fixes the root cause. That keeps one broken customer from slowing everyone down.
Retries without making things worse
Retries are where a good outbox setup becomes either dependable or noisy. The goal is simple: try again when it will likely work, and stop quickly when it won’t.
Use exponential backoff and a hard cap. For example: 1 minute, 2 minutes, 4 minutes, 8 minutes, then stop (or keep going with a maximum delay like 15 minutes). Always set a maximum number of attempts so one bad event can’t clog the system forever.
Not every failure should be retried. Keep the rules clear:
- Retry: network timeouts, connection resets, DNS hiccups, and HTTP 429 or 5xx responses.
- Don’t retry: HTTP 400 (bad request), 401/403 (auth problems), 404 (wrong endpoint), or validation errors you can detect before sending.
Store retry state on the outbox row. Increment attempts, set available_at for the next attempt, and record a short, safe error summary (status code, error class, trimmed message). Don’t store full payloads or sensitive data in error fields.
Rate limits need special handling. If you get HTTP 429, respect Retry-After when it exists. Otherwise, back off more aggressively to avoid a retry storm.
Deduplication and idempotency basics
If you build reliable API integrations, assume the same event can be sent twice. A worker can crash after the HTTP call but before it records success. A timeout can hide a success. A retry can overlap with a slow first attempt. The outbox pattern reduces missed events, but it doesn’t prevent duplicates by itself.
The safest approach is idempotency: repeated deliveries produce the same result as one delivery. When calling a third-party API, include an idempotency key that stays stable for that event and that destination. Many APIs support a header; if not, put the key in the request body.
A simple key is destination plus event ID. For an event with ID evt_123, always use something like destA:evt_123.
On your side, prevent duplicate sends by keeping an outbound delivery log and enforcing a unique rule like (destination, event_id). Even if two workers race, only one can create the “we’re sending this” record.
Webhooks can duplicate too
If you receive webhook callbacks (like “delivery confirmed” or “status updated”), treat them the same way. Providers retry, and you can see the same payload multiple times. Store processed webhook IDs, or compute a stable hash from the provider’s message ID and reject repeats.
How long to keep data
Keep outbox rows until you have recorded success (or a final failure you accept). Keep delivery logs longer, because they’re your audit trail when someone asks, “Did we send it?”
A common approach:
- Outbox rows: delete or archive after success plus a short safety window (days).
- Delivery logs: keep for weeks or months, based on compliance and support needs.
- Idempotency keys: keep at least as long as retries can happen (and longer for webhook duplicates).
Step-by-step: implementing the outbox pattern
Decide what you will publish. Keep events small, focused, and easy to replay later. A good rule is one business fact per event, with enough data for the receiver to act.
Build the foundation
Pick clear event names (for example, order.created, order.paid) and version your payload schema (like v1, v2). Versioning lets you add fields later without breaking older consumers.
Create your PostgreSQL outbox table and add indexes for the queries your worker will run most, especially (status, available_at, id).
Update your write flow so the business change and the outbox insert happen in the same database transaction. That’s the core guarantee.
Add delivery and control
A simple implementation plan:
- Define event types and payload versions you can support long-term.
- Create the outbox table and indexes.
- Insert an outbox row alongside the main data change.
- Build a worker that claims rows, sends to the third-party API, then updates status.
- Add retry scheduling with backoff and a
failedstate when attempts are exhausted.
Add basic metrics so you notice trouble early: lag (oldest unsent event age), send rate, and failure rate.
A simple example: sending order events to external services
A customer places an order in your app. Two things must happen outside your system: the billing provider must charge the card, and the shipping provider must create a shipment.
With the outbox pattern, you don’t call those APIs inside the checkout request. Instead, you save the order and an outbox event in the same PostgreSQL transaction, so you never end up with “order saved, but no notification” (or the reverse).
A typical outbox row for an order event might include an aggregate_id (the order ID), an event_type like order.created, and a JSONB payload with totals, items, and destination details.
A worker then picks up pending rows and calls the external services (either in a defined order or by emitting separate events like payment.requested and shipment.requested). If one provider is down, the worker records the attempt, schedules the next try by pushing available_at into the future, and moves on. The order still exists, and the event will be retried later without blocking new checkouts.
Ordering is usually “per order” or “per customer.” Enforce that events with the same aggregate_id are processed one at a time so order.paid never arrives before order.created.
Deduplication is what keeps you from charging twice or creating two shipments. Send an idempotency key when the third party supports it, and keep a destination delivery record so a retry after a timeout doesn’t trigger a second action.
Quick checks before you ship
Before you trust an integration to move money, notify customers, or sync data, test the edges: crashes, retries, duplicates, and multiple workers.
Checks that catch the common failures:
- Confirm the outbox row is created in the same transaction as the business change.
- Verify the sender is safe to run in multiple instances. Two workers shouldn’t send the same event at the same time.
- If ordering matters, define the rule in one sentence and enforce it with a stable key.
- For each destination, decide how you prevent duplicates and how you prove “we sent it.”
- Define the exit: after N attempts, move the event to
failed, keep the last error summary, and provide a simple reprocess action.
A reality check: Stripe might accept a request but your worker crashes before saving success. Without idempotency, a retry can cause a double action. With idempotency plus a saved delivery record, the retry becomes safe.
Next steps: rolling this out without disrupting your app
Rollout is where outbox projects usually succeed or stall. Keep it small at first so you see real behavior without putting your whole integration layer at risk.
Start with one integration and one event type. For example, only send order.created to a single vendor API while everything else stays as-is. That gives you a clean baseline for throughput, latency, and failure rates.
Make problems visible early. Add dashboards and alerts for outbox lag (how many events are waiting, and how old the oldest one is) and failure rate (how many are stuck in retry). If you can answer “are we behind right now?” in 10 seconds, you’ll catch issues before users do.
Have a safe reprocess plan before the first incident. Decide what “reprocess” means: retry the same payload, rebuild the payload from current data, or send it for manual review. Document which cases are safe to resend and which need a human check.
If you’re building this with a no-code platform like AppMaster (appmaster.io), the same structure still applies: write your business data and an outbox row together in PostgreSQL, then run a separate backend process to deliver, retry, and mark events as sent or failed.
FAQ
Use the outbox pattern when a user action updates your database and must trigger work in another system. It’s most useful when timeouts, flaky networks, or third-party outages can create “saved in our app, missing in theirs” situations.
Writing the business row and the outbox row in the same database transaction gives you one clear guarantee: either both exist or neither exists. That prevents partial failures like “API call succeeded but the order wasn’t saved” or “order saved but the API call never happened.”
A good default is id, aggregate_id, event_type, payload, status, created_at, available_at, attempts, plus lock fields like locked_at and locked_by. This keeps sending, retry scheduling, and safe concurrency simple without overcomplicating the table.
A common baseline is an index on (status, available_at, id) so workers can quickly fetch the next batch of sendable events in order. Add other indexes only when you truly query by those fields, because extra indexes slow down inserts.
Polling is the simplest and most predictable approach for most teams. Start with small batches and a short interval, then tune based on load and lag; you can add optimizations later, but a simple loop is easier to debug when things go wrong.
Claim rows using row-level locks so two workers can’t process the same event at the same time, typically with SKIP LOCKED. Then mark the row as processing with a lock timestamp and worker ID, send it, and finally mark it sent or return it to pending with a future available_at.
Use exponential backoff with a hard cap on attempts, and retry only failures that are likely temporary. Timeouts, network errors, and HTTP 429/5xx are good retry candidates; validation errors and most 4xx responses should be treated as final until you fix data or configuration.
Assume duplicates can still happen, especially if a worker crashes after the HTTP call but before recording success. Use an idempotency key that is stable per destination and per event, and keep a delivery record (with a unique constraint) so even racing workers can’t create two sends.
Default to preserving order within a group, not globally. Use a grouping key like aggregate_id (order ID) or customer_id, process only one event at a time per group, and allow parallelism across different groups so one slow customer doesn’t block everyone.
Mark it as failed after a maximum number of attempts, keep a short safe error summary, and stop processing later events for that same group until someone fixes the root cause. This contains the blast radius and prevents endless retry noise while keeping other groups moving.


