Event-driven workflows vs request-response APIs for long tasks
Compare event-driven workflows vs request-response APIs for long-running processes, focusing on approvals, timers, retries, and audit trails in business apps.

Why long-running processes are tricky in business apps
A process is âlong-runningâ when it canât finish in one quick step. It might take minutes, hours, or days because it depends on people, time, or outside systems. Anything with approvals, handoffs, and waiting falls into this bucket.
Thatâs where simple request-response API thinking starts to break down. An API call is built for a short exchange: send a request, get an answer, move on. Long tasks are more like a story with chapters. You need to pause, remember exactly where you are, and continue later without guessing.
You see this in everyday business apps: purchase approvals that need a manager and finance, employee onboarding that waits on document checks, refunds that depend on a payment provider, or access requests that must be reviewed and then applied.
When teams treat a long process like a single API call, a few predictable problems show up:
- The app loses state after a restart or deploy and canât reliably resume.
- Retries create duplicates: a second payment, a second email, a double approval.
- Ownership gets fuzzy: nobody knows whether the requester, a manager, or a system job should act next.
- Support has no visibility and canât answer âwhere is it stuck?â without digging through logs.
- Waiting logic (timers, reminders, deadlines) ends up as fragile background scripts.
A concrete scenario: an employee requests software access. The manager approves quickly, but IT needs two days to provision it. If the app canât hold process state, send reminders, and resume safely, you get manual follow-ups, confused users, and extra work.
This is why the choice between event-driven workflows vs request-response APIs matters most for long-running business processes.
Two mental models: synchronous calls vs events over time
The simplest comparison comes down to one question: does the work finish while the user waits, or does it continue after they leave?
A request-response API is a single exchange: one call in, one response out. It fits work that completes quickly and predictably, like creating a record, calculating a quote, or checking inventory. The server does the work, returns success or failure, and the interaction is over.
An event-driven workflow is a series of reactions over time. Something happens (an order is created, a manager approves, a timer expires), and the workflow moves to the next step. This model fits work that includes handoffs, waiting, retries, and reminders.
The practical difference is state.
With request-response, state often lives in the current request plus server memory until the response is sent. With event-driven workflows, state has to be stored (for example in PostgreSQL) so the process can resume later.
Failure handling also changes. Request-response usually handles failures by returning an error and asking the client to try again. Workflows record the failure and can retry safely when conditions improve. They can also log each step as an event, which makes the history easier to reconstruct.
A simple example: âSubmit expense reportâ can be synchronous. âGet approval, wait 3 days, remind manager, then payâ isnât.
Approvals: how each approach handles human decisions
Approvals are where long-running work becomes real. A system step finishes in milliseconds, but a person might answer in two minutes or two days. The key design choice is whether you model that wait as a paused process, or as a new message that arrives later.
With request-response APIs, approvals often turn into an awkward shape:
- Blocking (not practical)
- Polling (the client asks âapproved yet?â over and over)
- Callbacks/webhooks (the server calls you back later)
All of these can work, but they add plumbing just to bridge âhuman timeâ with âAPI time.â
With events, the approval reads like a story. The app records something like âExpenseSubmitted,â then later receives âExpenseApprovedâ or âExpenseRejected.â The workflow engine (or your own state machine) moves the record forward only when the next event arrives. This matches how most people already think about business steps: submit, review, decide.
Complexity shows up fast with multiple approvers and escalation rules. You might require both a manager and finance to approve, but also allow a senior manager to override. If you donât model those rules clearly, the process becomes hard to reason about and even harder to audit.
A simple approval model that scales
A practical pattern is to keep one ârequestâ record, then store decisions separately. That way you can support many approvers without rewriting core logic.
Capture a few pieces of data as first-class records:
- The approval request itself: whatâs being approved and its current status
- Individual decisions: who decided, approve/reject, timestamp, reason
- The required approvers: role or person, and any ordering rules
- Outcome rules: âany one,â âmajority,â âall required,â âoverride allowedâ
Whatever implementation you choose, always store who approved what, when, and why as data, not as a log line.
Timers and waiting: reminders, deadlines, and escalations
Waiting is where long tasks start to feel messy. People go to lunch, calendars fill up, and âweâll get back to youâ turns into âwho owns this now?â This is one of the clearest differences between event-driven workflows vs request-response APIs.
With request-response APIs, time is awkward. HTTP calls have timeouts, so you canât keep a request open for two days. Teams usually end up with patterns like polling, a separate scheduled job that scans the database, or manual scripts when something is overdue. These can work, but the waiting logic lives outside the process. Corner cases are easy to miss, like what happens when the job runs twice, or when the record changed right before the reminder went out.
Workflows treat time as a normal step. You can say: wait 24 hours, send a reminder, then wait until 48 hours total and escalate to a different approver. The system keeps state, so deadlines arenât hidden in a separate âcron + queriesâ project.
A simple approval rule might read like this:
After an expense report is submitted, wait 1 day. If the status is still âPending,â message the manager. After 2 days, if itâs still pending, reassign to the managerâs lead and record the escalation.
The key detail is what you do when the timer fires but the world has changed. A good workflow always re-checks current state before acting:
- Load the latest status
- Confirm itâs still pending
- Confirm the assignee is still valid (team changes happen)
- Record what you decided and why
Retries and failure recovery without duplicate actions
Retries are what you do when something failed for reasons you canât fully control: a payment gateway times out, an email provider returns a temporary error, or your app saves step A but crashes before step B. The danger is simple: you try again and accidentally do the action twice.
With request-response APIs, the usual pattern is that the client calls an endpoint, waits, and if it doesnât get a clear success, it tries again. To make that safe, the server needs to treat repeated calls as the same intent.
A practical fix is an idempotency key: the client sends a unique token like pay:invoice-583:attempt-1. The server stores the outcome for that key and returns the same result for repeats. That prevents double charges, duplicate tickets, or duplicate approvals.
Event-driven workflows have a different kind of duplication risk. Events are often delivered at-least-once, which means duplicates can appear even when everything is working. Consumers need deduplication: record the event ID (or a business key like invoice_id + step) and ignore repeats. This is a core difference in workflow orchestration patterns: request-response focuses on safe replays of calls, while events focus on safe replays of messages.
A few retry rules work well in either model:
- Use backoff (for example 10s, 30s, 2m).
- Set a max attempts limit.
- Separate temporary errors (retry) from permanent errors (fail fast).
- Route repeated failures into a âneeds attentionâ state.
- Log every attempt so you can explain what happened later.
Retries should be explicit in the process, not hidden behavior. Thatâs how you make failures visible and fixable.
Audit trails: making the process explainable
An audit trail is your âwhyâ file. When someone asks, âWhy was this expense rejected?â you should be able to answer without guessing, even months later. This matters in both event-driven workflows vs request-response APIs, but the work looks different.
For any long-running process, record the facts that let you replay the story:
- Actor: who did it (user, service, or system timer)
- Time: when it happened (with time zone)
- Input: what was known then (amount, vendor, policy thresholds, approvals)
- Output: what decision or action happened (approved, rejected, paid, retried)
- Rule version: which policy/logic version was used
Event-driven workflows can make auditing easier because each step naturally produces an event like âManagerApprovedâ or âPaymentFailed.â If you store those events with the payload and actor, you get a clean timeline. The key is to keep events descriptive and store them somewhere you can query by case.
Request-response APIs can still be auditable, but the story is often scattered across services. One endpoint logs âapproved,â another logs âpayment requested,â and a third logs âretry succeeded.â If each uses different formats or fields, audits turn into detective work.
A simple fix is a shared âcase IDâ (also called a correlation ID). Itâs one identifier you attach to every request, event, and database record for the process instance, like âEXP-2026-00173.â Then you can trace the whole journey across steps.
Picking the right approach: strengths and trade-offs
The best choice depends on whether you need an answer right now, or you need the process to keep moving over hours or days.
Request-response works well when the work is short and the rules are simple. A user submits a form, the server validates it, saves data, and returns success or an error. Itâs also a good fit for clear, single-step actions like create, update, or check permissions.
It starts to hurt when a âsingle requestâ quietly turns into many steps: waiting for an approval, calling multiple outside systems, handling timeouts, or branching based on what happens next. You either keep a connection open (fragile), or you push the waiting and retries into background jobs that are hard to reason about.
Event-driven workflows shine when the process is a story over time. Each step reacts to a new event (approved, rejected, timer fired, payment failed) and decides what happens next. This makes it easier to pause, resume, retry, and keep a clear trail of why the system did what it did.
There are real trade-offs:
- Simplicity vs durability: request-response is simpler to start, event-driven is safer for long delays.
- Debugging style: request-response follows a straight line, workflows often require tracing across steps.
- Tooling and habits: events need good logging, correlation IDs, and clear state models.
- Change management: workflows evolve and branch; event-driven designs tend to handle new paths better when modeled well.
A practical example: an expense report that needs manager approval, then finance review, then payment. If payment fails, you want retries without double-paying. Thatâs naturally event-driven. If itâs just âsubmit expenseâ with quick checks, request-response is often enough.
Step-by-step: designing a long-running process that survives delays
Long-running business processes fail in boring ways: a browser tab closes, a server restarts, an approval sits for three days, or a payment provider times out. Design for those delays from the start, regardless of which model you prefer.
Start by defining a small set of states you can store and resume. If you canât point to the current state in your database, you donât really have a resumable workflow.
A simple design sequence
- Set boundaries: define the start trigger, the end condition, and a few key states (Pending approval, Approved, Rejected, Expired, Completed).
- Name events and decisions: write down what can happen over time (Submitted, Approved, Rejected, TimerFired, RetryScheduled). Keep event names past tense.
- Choose waiting points: identify where the process pauses for a human, an external system, or a deadline.
- Add timer and retry rules per step: decide what happens when time passes or a call fails (backoff, max attempts, escalate, give up).
- Define how the process resumes: on each event or callback, load saved state, verify itâs still valid, then move to the next state.
To survive restarts, persist the minimum data you need to continue safely. Store enough to re-run without guessing:
- Process instance ID and current state
- Who can act next (assignee/role) and what they decided
- Deadlines (due_at, remind_at) and escalation level
- Retry metadata (attempt count, last error, next_retry_at)
- Idempotency key or âalready doneâ flags for side effects (sending a message, charging a card)
If you can reconstruct âwhere we areâ and âwhat is allowed nextâ from stored data, delays stop being scary.
Common mistakes and how to avoid them
Long-running processes often break only after real users show up. An approval takes two days, a retry fires at the wrong time, and you end up with a double payment or a missing audit trail.
Common mistakes:
- Keeping an HTTP request open while waiting for a human approval. It times out, ties up server resources, and gives the user a false sense that âsomething is happening.â
- Retrying calls without idempotency. A network glitch turns into duplicate invoices, duplicate emails, or repeated âApprovedâ transitions.
- Not storing process state. If state lives in memory, a restart wipes it out. If state lives only in logs, you canât reliably continue.
- Building a fuzzy audit trail. Events have different clocks and formats, so the timeline canât be trusted during an incident or compliance review.
- Mixing async and sync with no single source of truth. One system says âPaid,â another says âPending,â and nobody knows which is correct.
A simple example: an expense report is approved in chat, a webhook arrives late, and the payment API gets retried. Without stored state and idempotency, the retry can send the payment twice, and your records wonât clearly explain why.
Most fixes come down to being explicit:
- Persist state transitions (Requested, Approved, Rejected, Paid) in a database, with who/what changed them.
- Use idempotency keys for every external side effect (payments, emails, tickets) and store the result.
- Separate âaccept the requestâ from âfinish the workâ: return quickly, then complete the workflow in the background.
- Standardize timestamps (UTC), add correlation IDs, and record both the request and the outcome.
Quick checklist before you build
Long-running work is less about one perfect call and more about staying correct after delays, people, and failures.
Write down what âsafe to continueâ means for your process. If the app restarts mid-way, you should be able to pick up from the last known step without guessing.
A practical checklist:
- Define how the process resumes after a crash or deploy. What state is saved, and what runs next?
- Give every instance a unique process key (like ExpenseRequest-10482) and a clear status model (Submitted, Waiting for Manager, Approved, Paid, Failed).
- Treat approvals as records, not just outcomes: who approved or rejected, when, and the reason or comment.
- Map waiting rules: reminders, deadlines, escalations, expirations. Name an owner for each timer (manager, finance, system).
- Plan failure handling: retries must be limited and safe, and there should be a âneeds reviewâ stop where a person can fix data or approve a reattempt.
A sanity test: imagine a payment provider times out after you already charged the card. Your design should prevent charging twice, while still letting the process finish.
Example: expense approval with deadline and payment retry
Scenario: an employee submits a $120 taxi receipt for reimbursement. It needs manager approval within 48 hours. If approved, the system pays out to the employee. If payment fails, it retries safely and leaves a clear record.
Request-response walkthrough
With request-response APIs, the app often behaves like a conversation that has to keep checking back.
The employee taps Submit. The server creates a reimbursement record with status âPending approvalâ and returns an ID. The manager gets a notification, but the employee app usually has to poll to see if anything changed, for example: âGET reimbursement status by ID.â
To enforce the 48-hour deadline, you either run a scheduled job that scans for overdue requests, or you store a deadline timestamp and check it during polls. If the job is delayed, users can see stale status.
When the manager approves, the server flips the status to âApprovedâ and calls the payment provider. If Stripe returns a temporary error, the server has to decide whether to retry now, retry later, or fail. Without careful idempotency keys, a retry can create a double payout.
Event-driven walkthrough
In an event-driven model, each change is a recorded fact.
The employee submits, producing an âExpenseSubmittedâ event. A workflow starts and waits for either âManagerApprovedâ or a âDeadlineReachedâ timer event at 48 hours. If the timer fires first, the workflow records an âAutoRejectedâ outcome and why.
On approval, the workflow records âPayoutRequestedâ and attempts payment. If Stripe times out, it records âPayoutFailedâ with an error code, schedules a retry (for example, in 15 minutes), and only records âPayoutSucceededâ once using an idempotency key.
What the user sees stays simple:
- Pending approval (48 hours left)
- Approved, paying out
- Payment retry scheduled
- Paid
The audit trail reads like a timeline: submitted, approved, deadline checked, payout attempted, failed, retried, paid.
Next steps: turning the model into a working app
Pick one real process and build it end to end before you generalize. Expense approval, onboarding, and refund handling are good starters because they include human steps, waiting, and failure paths. Keep the goal small: one happy path and the two most common exceptions.
Write the process as states and events, not as screens. For example: âSubmittedâ -> âManagerApprovedâ -> âPaymentRequestedâ -> âPaid,â with branches like âApprovalRejectedâ or âPaymentFailed.â When you see the waiting points and side effects clearly, the choice between event-driven workflows vs request-response APIs becomes practical.
Decide where process state lives. A database can be enough if the flow is simple and you can enforce updates in one place. A workflow engine helps when you need timers, retries, and branching, because it tracks what should happen next.
Add audit fields from day one. Store who did what, when it happened, and why (comment or reason code). When someone asks, âWhy was this payment retried?â you want a clear answer without digging through logs.
If youâre building this kind of workflow in a no-code platform, AppMaster (appmaster.io) is one option where you can model data in PostgreSQL and build process logic visually, which can make approvals and audit trails easier to keep consistent across web and mobile apps.
FAQ
Use request-response when the work finishes quickly and predictably while the user waits, like creating a record or validating a form. Use an event-driven workflow when the process spans minutes to days, includes human approvals, or needs timers, retries, and safe resume after restarts.
Long tasks donât fit a single HTTP request because connections time out, servers restart, and the work often depends on people or external systems. If you treat it like one call, you usually lose state, create duplicates on retries, and end up with scattered background scripts to handle waiting.
A good default is to persist a clear process state in your database and advance it only through explicit transitions. Store the process instance ID, current status, who can act next, and the key timestamps so you can resume safely after deploys, crashes, or delays.
Model approvals as a paused step that resumes when a decision arrives, rather than blocking or constantly polling. Record each decision as data (who decided, when, approve/reject, and reason) so the workflow can move forward predictably and you can audit it later.
Polling can work for simple cases, but it adds noise and delays because the client has to keep asking âis it done yet?â A better default is to push a notification on change and let the client refresh on demand, while the server remains the source of truth for state.
Treat time as part of the process by storing deadlines and reminder times, then re-checking current state when a timer fires before acting. This avoids sending reminders after something was already approved and keeps escalations consistent even if jobs run late or twice.
Start with idempotency keys for any side effect like charging a card or sending an email, and store the outcome for that key. Then retries become safe because repeating the same intent returns the same result instead of doing the action again.
Assume messages can be delivered more than once and design consumers to deduplicate. A practical approach is to store the event ID (or a business key for the step) and ignore repeats so a replay doesnât trigger the same action twice.
Capture a timeline of facts: actor, timestamp, input at the time, the outcome, and the rule or policy version used. Also assign a single case or correlation ID to everything related to that process so support can answer âwhere is it stuck?â without digging through unrelated logs.
Keep one request record as the âcase,â store decisions separately, and drive state changes through persisted transitions that can be replayed. In a no-code tool like AppMaster, you can model the data in PostgreSQL and implement the step logic visually, which helps keep approvals, retries, and audit fields consistent across the app.


