Circuit breaker pattern for third-party APIs in visual workflows
Learn the circuit breaker pattern for third-party APIs in visual workflows: set thresholds, route fallbacks, block noisy retries, and send clear alerts.

Why third-party API outages break more than one feature
A single third-party API often sits in the middle of everyday work: taking payments, checking addresses, syncing inventory, sending messages, verifying identity. When that vendor has trouble, it rarely breaks just one button. It can freeze entire flows that need that response to move forward.
That’s why a circuit breaker matters. It’s not theory. It’s a practical way to keep core operations running even when an integration is unhealthy.
Slow and down hurt differently.
When an API is slow, your workflow still tries to succeed, but every step waits. Users see spinning screens, support teams get “it’s stuck” tickets, and background jobs pile up. Slow is tricky because it can look like your own system is failing.
When an API is down, you get timeouts or hard errors. That’s clearer, but often more dangerous because workflows tend to retry. When many requests retry at once, you create a traffic storm that makes recovery harder and can drag your own system down.
Common symptoms show up fast: timeouts, queues that keep growing, partial updates, and a lot of manual cleanup.
The real damage is the chain reaction. If a shipping-rate provider is slow, order placement slows because the workflow refuses to confirm the order without a quote. If payments are down, support may be blocked from issuing refunds even though everything else is working.
You can’t pretend outages don’t exist. The goal is to design workflows with clear fallback paths, temporary blocking rules, and alerting so the business can keep taking orders, serving customers, and recording work while the integration recovers.
Circuit breaker in plain terms
A circuit breaker is a safety switch for API calls. When a third-party service starts failing, the breaker stops your workflow from hammering it over and over. Instead of turning one outage into slow screens, timeouts, and stuck jobs, you control the blast radius.
A circuit breaker has three simple outcomes:
- Allow the call when the vendor looks healthy.
- Block the call when failures are high, and immediately take a fallback path.
- Try a limited test call after a short pause to see whether the vendor is back.
If you prefer labels, those are “closed,” “open,” and “half-open.” The names aren’t the point. Predictability is. When a vendor is sick, your workflow should behave the same way every time.
This doesn’t hide errors. You still record failures, show a clear status to users or ops, and alert the right people. You’re choosing to fail fast, route work to a safer alternative, or pause briefly before testing again.
Choose which API calls must never stop the business
Circuit breakers work best when you’re selective. Not every vendor call deserves special protection. Start with steps that, if blocked, stop money, orders, or customer access.
A practical method is to follow one user request end-to-end. Where would a timeout force the user to abandon the task, or create a mess your team has to clean up later?
Typical “must not block core work” calls include payments, shipping and fulfillment, login/SSO/MFA, OTP and confirmation messages, and compliance checks tied to approval.
Also split user-facing steps from background jobs. If someone is waiting on a checkout screen, you need a fast decision: succeed, fall back, or stop with a clear message. For background work like syncing tracking numbers, slower retries are fine as long as they never block the main flow.
Start small to avoid scope creep. Protect 1-3 workflows first, then expand.
Define what “safe fallback” means before you build anything. Good fallbacks are specific and testable:
- Payments: save the order as “payment pending” so the cart isn’t lost.
- Shipping: use a cached rate, a flat rate, or confirm the order and delay label purchase.
- Identity: allow password login when SSO is down, or switch to email verification.
- Messaging: queue SMS for later and provide an alternative path when possible.
In AppMaster’s Business Process Editor, this usually becomes a clean branch: the core operation continues, while the vendor-dependent step takes a controlled alternative.
States, thresholds, and timers you can explain
A circuit breaker is a safety switch. Most of the time it lets calls through. When the vendor starts failing, it flips to protect your workflow from wasted time and error pileups.
The three states
Closed is normal. You call the API and continue.
If failures cross a line, the breaker goes Open. You stop calling the vendor for a short period and immediately route to a fallback (cached value, queued work, alternate flow).
After a cooldown, the breaker goes Half-open. You allow a small number of test calls. If they succeed, you return to Closed. If they fail, you go back to Open.
What to measure
Use signals that match how the vendor fails:
- Timeouts
- HTTP 5xx errors
- Rising latency (too slow to be useful)
- Connection/DNS errors
- 429 rate limits
In a visual workflow tool, these usually map to simple checks: status code, elapsed time, and specific error outputs.
Starting thresholds and the two key timers
Start with numbers that are easy to explain, then tune them based on real traffic. Examples:
- Open the breaker if 5-10 calls fail within 30-60 seconds.
- Or open if 20%-40% of calls fail in a rolling window.
- Treat latency as a failure when it exceeds what your process can tolerate (often 2-5 seconds).
Then set two timers:
- Cooldown time (Open state): often 30 seconds to 5 minutes.
- Half-open test window: allow 1-5 test calls, or a short time window like 10-30 seconds.
The goal is straightforward: fail fast when the vendor is unhealthy, recover automatically when it’s back.
Step-by-step: build a circuit breaker in a visual workflow
The most important design choice is to make the “should we call the vendor right now?” decision in one place, not scattered across every workflow.
1) Put the vendor call behind one reusable block
Create one sub-process (a reusable workflow block) that every workflow uses when it needs that vendor. In AppMaster, this maps naturally to a Business Process you can call from endpoints or automations. Keep it narrow: inputs in, vendor request out, plus a clear success/fail result.
2) Track failures with time, not just counts
Record each outcome with a timestamp. Store things like last success, last failure, failures within a window, current state, and a cooldown deadline.
Persist these fields in a table so the breaker survives restarts and stays consistent across multiple instances. PostgreSQL via Data Designer fits well for this.
3) Define state changes you’ll follow every time
Keep the rules simple. Example: if 5 failures happen within 2 minutes, switch to Open. While Open, skip the vendor call until the cooldown passes. After cooldown, go Half-open and allow one controlled attempt. If it works, close the breaker. If it fails, open it again.
4) Branch the workflow: vendor path vs fallback path
Before the vendor request, check the stored state:
- Closed: call the vendor, then update success or failure.
- Open: skip the call and run the fallback.
- Half-open: allow a limited attempt, then decide whether to close or reopen.
Example: if a shipping label API is down, the fallback can create the order with a “Label pending” status and queue a retry job, instead of blocking checkout or warehouse work.
5) Make it shared across workflows
If you have multiple workflows and servers, they must read and write the same breaker state. Otherwise one instance may keep hammering the vendor while another has already decided to pause.
Fallback paths that keep work moving
A circuit breaker only helps if you decide what happens when the call is blocked. A good fallback keeps the user moving, protects your data, and makes later cleanup predictable.
Pick a fallback that matches the job. If a shipping-rate provider is down, you may not need an exact price to accept the order. In a visual workflow, route the failed API step to a fallback branch that still produces a usable outcome.
In practice, fallbacks usually look like one of these:
- Use a cached last-known value (with a clear freshness window).
- Use a safe default estimate, clearly labeled.
- Route to manual review.
- Queue the work for retry later (async job).
The user experience matters as much as the logic. Don’t show a vague error. Say what happened and what the user can do next: “We couldn’t confirm the rate right now. You can place the order with an estimated shipping cost, or save it for review.”
Also plan for short vs long outages. A short outage (minutes) often means “keep going, retry in the background.” A longer outage (hours) may require stricter behavior like more manual review or approvals.
Finally, track every fallback so reconciliation is easy. At minimum, record the fallback type, original request details, what you returned to the user (and whether it was an estimate), and a status for follow-up.
Temporary blocking rules and smarter retries
Uncontrolled retries turn small vendor hiccups into real outages. When many workflows retry at the same time, they create a spike (the “thundering herd” problem). The vendor gets slower, your queues grow, and you burn rate limits.
Retries should be predictable and limited, and they should respect the breaker state. A practical policy is:
- Keep max retries low (often 2-3).
- Use exponential backoff (for example, 2s, 8s, 30s).
- Add jitter so retries don’t sync up.
- Cap total retry time (for example, 60-90 seconds).
- If the breaker is Open, don’t retry. Go straight to fallback.
Temporary blocking is related but distinct. It’s for cases where the response tells you “this won’t work right now.” Common rules:
- 429 rate limit: block for the Retry-After period (or a safe fixed window).
- 401/403 auth failure: block until credentials are refreshed, then test once.
- Consistent 5xx: block briefly, then allow a small test.
During a block, decide what happens to work already in motion: queue it, reroute it, or degrade gracefully (for example, accept the order but delay “send SMS”).
Alerting that tells you what to do
A circuit breaker is only helpful if people hear about it quickly and know what to do. The goal isn’t noise. It’s one clear message when behavior changes: calls are being blocked, fallbacks are active, or the breaker has stayed open longer than expected.
Good default triggers:
- Alert when the breaker opens.
- Alert if it stays open past a time limit.
- Alert on a sharp rise in errors even before it opens.
Make alerts actionable. Include the vendor and endpoint, current state and when it changed, what users will feel, what the workflow is doing now (blocking, retrying, fallback), and one suggested next step.
Route alerts by severity. A non-critical enrichment API can go to email. Payments, login, or order submission usually deserves a page. In AppMaster this maps cleanly to branches that send email, Telegram, or SMS based on a severity flag.
Track a small set of metrics so you can see whether you’re recovering: blocked calls and fallback usage per vendor are often enough.
Example scenario: vendor outage without stopping orders
A common failure: your shipping rate provider goes down right when customers are checking out. If your workflow insists on live rates during order creation, a single outage can stop orders entirely.
On a normal day, the order is created, then the workflow requests live rates, and the order is saved with the chosen carrier and price.
When the vendor starts failing, calls time out or return 5xx errors. Once your threshold is hit (for example, 5 failures in 2 minutes), the breaker opens.
While Open, the workflow stops calling the shipping provider for a short window (say, 10 minutes). That prevents a failing vendor from dragging down checkout for everyone.
Instead of blocking checkout, the fallback can:
- Apply a flat-rate fee (or a safe estimate).
- Create the order anyway.
- Mark it as “Shipping rate pending” for later recalculation.
- Save vendor error details for follow-up.
In AppMaster, this is a clear branch in the Business Process Editor, backed by Data Designer fields like shipping_rate_status and shipping_rate_source.
Quick checks before you ship
A circuit breaker should behave the same way under stress as it does in a demo. Before release, confirm the basics:
- Thresholds and cooldowns are documented and easy to change.
- Open state blocks calls immediately (no waiting on vendor timeouts).
- Fallback behavior is safe for money and customer promises.
- Half-open probing is limited to a few test calls.
- Logs make timing and impact easy to answer.
Spend extra time on fallback safety. A fallback that “keeps work moving” can also create financial risk. If the payment provider is down, marking orders as paid is dangerous. A safer approach is “pending payment,” with clear customer messaging.
Test recovery on purpose. Force failures, watch the breaker open, wait through cooldown, and confirm Half-open only sends a small probe. If it succeeds, it should close quickly. If it fails, it should reopen without flooding the vendor.
Your logs should answer, in under a minute: who was affected, when it started, which workflow step triggered the breaker, and what fallback was used.
Next steps: implement the pattern in AppMaster
Pick one integration that can hurt daily operations if it fails (payments, shipping labels, SMS, CRM sync). Build the breaker end-to-end for that single call first. Once the team trusts the behavior, repeat the same template for the next vendor.
In AppMaster, model breaker state in PostgreSQL using the Data Designer. Keep it simple: a record per vendor (or endpoint) with fields like state, failure_count, last_failure_at, open_until, and a short last_error.
Then implement the logic in the Business Process Editor with readable branches. Clarity beats cleverness.
A practical build order:
- Check breaker state and block calls when
open_untilis in the future. - Call the vendor API and capture both success and error outputs.
- On success, reset counters and close the breaker.
- On failure, increment counters and open the breaker when thresholds are met.
- Route user-facing flows to a fallback (queue work, use cached data, or allow manual processing).
Document the fallback decision in plain language so support and ops know what users see.
When the breaker opens, notify an owner using AppMaster messaging modules (email, SMS, Telegram). Include what matters: vendor, endpoint, state, and the first recommended action.
If you’re building these workflows in AppMaster, appmaster.io is a practical place to start because the same visual Business Process can power endpoints, background jobs, and alerting with one shared breaker state.
FAQ
A circuit breaker stops repeated calls to a failing vendor and forces a fast, predictable outcome. Instead of waiting on timeouts and piling up retries, you either proceed normally, take a fallback path immediately, or allow a small test call after a cooldown.
It’s worth it when a vendor call can block money, orders, or customer access, or when failures create a queue that’s hard to clean up. Start with 1–3 high-impact workflows like checkout payments, shipping rates/labels, login/SSO, or OTP delivery.
“Slow” makes your system look broken because users wait, pages spin, and jobs back up even if the vendor eventually responds. “Down” is clearer but can be worse because many systems retry aggressively, causing a traffic storm that delays recovery and can overload your own infrastructure.
Closed means calls are allowed as normal. Open means calls are blocked for a short period and your workflow immediately uses a fallback. Half-open means you allow a small number of test calls after cooldown to check if the vendor is healthy again.
Use signals that match real failure: timeouts, HTTP 5xx, connection/DNS errors, rate limits (429), and latency that exceeds what your workflow can tolerate. Treat “too slow to be useful” as a failure so you fail fast instead of making users wait.
Start with simple rules you can explain, then tune from traffic. A common setup is opening after 5–10 failures in 30–60 seconds, or when 20%–40% of calls fail in a rolling window, with latency over 2–5 seconds counted as failure for user-facing steps.
A safe default is 30 seconds to 5 minutes for the Open cooldown, so you stop hammering the vendor while it’s unhealthy. In Half-open, allow only 1–5 test calls (or a brief window like 10–30 seconds) so you can recover quickly without flooding the vendor.
Pick a fallback that keeps work moving without lying about the outcome. Examples include saving an order as “payment pending,” using a cached or flat shipping rate with clear labeling, queueing messages for later, or routing the case to manual review.
Keep retries low (often 2–3), use exponential backoff, add jitter, and cap total retry time so you don’t clog queues. If the breaker is Open, don’t retry at all; go straight to fallback so you avoid creating a thundering herd.
Alert when the breaker opens, when it stays open too long, and when errors spike even before it opens. Each alert should say which vendor/endpoint is affected, what users will see, what fallback is active, when the state changed, and the next action your team should take.


