Mar 20, 2025·8 min read

Event-driven workflows vs request-response APIs for long tasks

Compare event-driven workflows vs request-response APIs for long-running processes, focusing on approvals, timers, retries, and audit trails in business apps.

Why long-running processes are tricky in business apps

A process is “long-running” when it can’t finish in one quick step. It might take minutes, hours, or days because it depends on people, time, or outside systems. Anything with approvals, handoffs, and waiting falls into this bucket.

That’s where simple request-response API thinking starts to break down. An API call is built for a short exchange: send a request, get an answer, move on. Long tasks are more like a story with chapters. You need to pause, remember exactly where you are, and continue later without guessing.

You see this in everyday business apps: purchase approvals that need a manager and finance, employee onboarding that waits on document checks, refunds that depend on a payment provider, or access requests that must be reviewed and then applied.

When teams treat a long process like a single API call, a few predictable problems show up:

The app loses state after a restart or deploy and can’t reliably resume.
Retries create duplicates: a second payment, a second email, a double approval.
Ownership gets fuzzy: nobody knows whether the requester, a manager, or a system job should act next.
Support has no visibility and can’t answer “where is it stuck?” without digging through logs.
Waiting logic (timers, reminders, deadlines) ends up as fragile background scripts.

A concrete scenario: an employee requests software access. The manager approves quickly, but IT needs two days to provision it. If the app can’t hold process state, send reminders, and resume safely, you get manual follow-ups, confused users, and extra work.

This is why the choice between event-driven workflows vs request-response APIs matters most for long-running business processes.

Two mental models: synchronous calls vs events over time

The simplest comparison comes down to one question: does the work finish while the user waits, or does it continue after they leave?

A request-response API is a single exchange: one call in, one response out. It fits work that completes quickly and predictably, like creating a record, calculating a quote, or checking inventory. The server does the work, returns success or failure, and the interaction is over.

An event-driven workflow is a series of reactions over time. Something happens (an order is created, a manager approves, a timer expires), and the workflow moves to the next step. This model fits work that includes handoffs, waiting, retries, and reminders.

The practical difference is state.

With request-response, state often lives in the current request plus server memory until the response is sent. With event-driven workflows, state has to be stored (for example in PostgreSQL) so the process can resume later.

Failure handling also changes. Request-response usually handles failures by returning an error and asking the client to try again. Workflows record the failure and can retry safely when conditions improve. They can also log each step as an event, which makes the history easier to reconstruct.

A simple example: “Submit expense report” can be synchronous. “Get approval, wait 3 days, remind manager, then pay” isn’t.

Approvals: how each approach handles human decisions

Approvals are where long-running work becomes real. A system step finishes in milliseconds, but a person might answer in two minutes or two days. The key design choice is whether you model that wait as a paused process, or as a new message that arrives later.

With request-response APIs, approvals often turn into an awkward shape:

Blocking (not practical)
Polling (the client asks “approved yet?” over and over)
Callbacks/webhooks (the server calls you back later)

All of these can work, but they add plumbing just to bridge “human time” with “API time.”

With events, the approval reads like a story. The app records something like “ExpenseSubmitted,” then later receives “ExpenseApproved” or “ExpenseRejected.” The workflow engine (or your own state machine) moves the record forward only when the next event arrives. This matches how most people already think about business steps: submit, review, decide.

Complexity shows up fast with multiple approvers and escalation rules. You might require both a manager and finance to approve, but also allow a senior manager to override. If you don’t model those rules clearly, the process becomes hard to reason about and even harder to audit.

A simple approval model that scales

A practical pattern is to keep one “request” record, then store decisions separately. That way you can support many approvers without rewriting core logic.

Capture a few pieces of data as first-class records:

The approval request itself: what’s being approved and its current status
Individual decisions: who decided, approve/reject, timestamp, reason
The required approvers: role or person, and any ordering rules
Outcome rules: “any one,” “majority,” “all required,” “override allowed”

Whatever implementation you choose, always store who approved what, when, and why as data, not as a log line.

Timers and waiting: reminders, deadlines, and escalations

Waiting is where long tasks start to feel messy. People go to lunch, calendars fill up, and “we’ll get back to you” turns into “who owns this now?” This is one of the clearest differences between event-driven workflows vs request-response APIs.

With request-response APIs, time is awkward. HTTP calls have timeouts, so you can’t keep a request open for two days. Teams usually end up with patterns like polling, a separate scheduled job that scans the database, or manual scripts when something is overdue. These can work, but the waiting logic lives outside the process. Corner cases are easy to miss, like what happens when the job runs twice, or when the record changed right before the reminder went out.

Workflows treat time as a normal step. You can say: wait 24 hours, send a reminder, then wait until 48 hours total and escalate to a different approver. The system keeps state, so deadlines aren’t hidden in a separate “cron + queries” project.

A simple approval rule might read like this:

After an expense report is submitted, wait 1 day. If the status is still “Pending,” message the manager. After 2 days, if it’s still pending, reassign to the manager’s lead and record the escalation.

The key detail is what you do when the timer fires but the world has changed. A good workflow always re-checks current state before acting:

Load the latest status
Confirm it’s still pending
Confirm the assignee is still valid (team changes happen)
Record what you decided and why

Retries and failure recovery without duplicate actions

Test your approval scenario

Prototype an expense approval with reminders, escalation, and payment retry logic.

Prototype Now

Retries are what you do when something failed for reasons you can’t fully control: a payment gateway times out, an email provider returns a temporary error, or your app saves step A but crashes before step B. The danger is simple: you try again and accidentally do the action twice.

With request-response APIs, the usual pattern is that the client calls an endpoint, waits, and if it doesn’t get a clear success, it tries again. To make that safe, the server needs to treat repeated calls as the same intent.

A practical fix is an idempotency key: the client sends a unique token like pay:invoice-583:attempt-1. The server stores the outcome for that key and returns the same result for repeats. That prevents double charges, duplicate tickets, or duplicate approvals.

Event-driven workflows have a different kind of duplication risk. Events are often delivered at-least-once, which means duplicates can appear even when everything is working. Consumers need deduplication: record the event ID (or a business key like invoice_id + step) and ignore repeats. This is a core difference in workflow orchestration patterns: request-response focuses on safe replays of calls, while events focus on safe replays of messages.

A few retry rules work well in either model:

Use backoff (for example 10s, 30s, 2m).
Set a max attempts limit.
Separate temporary errors (retry) from permanent errors (fail fast).
Route repeated failures into a “needs attention” state.
Log every attempt so you can explain what happened later.

Retries should be explicit in the process, not hidden behavior. That’s how you make failures visible and fixable.

Audit trails: making the process explainable

Deploy without rewriting

Run your workflow in AppMaster Cloud or deploy to your own cloud when ready.

Deploy Now

An audit trail is your “why” file. When someone asks, “Why was this expense rejected?” you should be able to answer without guessing, even months later. This matters in both event-driven workflows vs request-response APIs, but the work looks different.

For any long-running process, record the facts that let you replay the story:

Actor: who did it (user, service, or system timer)
Time: when it happened (with time zone)
Input: what was known then (amount, vendor, policy thresholds, approvals)
Output: what decision or action happened (approved, rejected, paid, retried)
Rule version: which policy/logic version was used

Event-driven workflows can make auditing easier because each step naturally produces an event like “ManagerApproved” or “PaymentFailed.” If you store those events with the payload and actor, you get a clean timeline. The key is to keep events descriptive and store them somewhere you can query by case.

Request-response APIs can still be auditable, but the story is often scattered across services. One endpoint logs “approved,” another logs “payment requested,” and a third logs “retry succeeded.” If each uses different formats or fields, audits turn into detective work.

A simple fix is a shared “case ID” (also called a correlation ID). It’s one identifier you attach to every request, event, and database record for the process instance, like “EXP-2026-00173.” Then you can trace the whole journey across steps.

Picking the right approach: strengths and trade-offs

The best choice depends on whether you need an answer right now, or you need the process to keep moving over hours or days.

Request-response works well when the work is short and the rules are simple. A user submits a form, the server validates it, saves data, and returns success or an error. It’s also a good fit for clear, single-step actions like create, update, or check permissions.

It starts to hurt when a “single request” quietly turns into many steps: waiting for an approval, calling multiple outside systems, handling timeouts, or branching based on what happens next. You either keep a connection open (fragile), or you push the waiting and retries into background jobs that are hard to reason about.

Event-driven workflows shine when the process is a story over time. Each step reacts to a new event (approved, rejected, timer fired, payment failed) and decides what happens next. This makes it easier to pause, resume, retry, and keep a clear trail of why the system did what it did.

There are real trade-offs:

Simplicity vs durability: request-response is simpler to start, event-driven is safer for long delays.
Debugging style: request-response follows a straight line, workflows often require tracing across steps.
Tooling and habits: events need good logging, correlation IDs, and clear state models.
Change management: workflows evolve and branch; event-driven designs tend to handle new paths better when modeled well.

A practical example: an expense report that needs manager approval, then finance review, then payment. If payment fails, you want retries without double-paying. That’s naturally event-driven. If it’s just “submit expense” with quick checks, request-response is often enough.

Step-by-step: designing a long-running process that survives delays

Build the complete business app

Ship a full backend plus web and mobile apps for your approval or onboarding process.

Create App

Long-running business processes fail in boring ways: a browser tab closes, a server restarts, an approval sits for three days, or a payment provider times out. Design for those delays from the start, regardless of which model you prefer.

Start by defining a small set of states you can store and resume. If you can’t point to the current state in your database, you don’t really have a resumable workflow.

A simple design sequence

Set boundaries: define the start trigger, the end condition, and a few key states (Pending approval, Approved, Rejected, Expired, Completed).
Name events and decisions: write down what can happen over time (Submitted, Approved, Rejected, TimerFired, RetryScheduled). Keep event names past tense.
Choose waiting points: identify where the process pauses for a human, an external system, or a deadline.
Add timer and retry rules per step: decide what happens when time passes or a call fails (backoff, max attempts, escalate, give up).
Define how the process resumes: on each event or callback, load saved state, verify it’s still valid, then move to the next state.

To survive restarts, persist the minimum data you need to continue safely. Store enough to re-run without guessing:

Process instance ID and current state
Who can act next (assignee/role) and what they decided
Deadlines (due_at, remind_at) and escalation level
Retry metadata (attempt count, last error, next_retry_at)
Idempotency key or “already done” flags for side effects (sending a message, charging a card)

If you can reconstruct “where we are” and “what is allowed next” from stored data, delays stop being scary.

Common mistakes and how to avoid them

Long-running processes often break only after real users show up. An approval takes two days, a retry fires at the wrong time, and you end up with a double payment or a missing audit trail.

Common mistakes:

Keeping an HTTP request open while waiting for a human approval. It times out, ties up server resources, and gives the user a false sense that “something is happening.”
Retrying calls without idempotency. A network glitch turns into duplicate invoices, duplicate emails, or repeated “Approved” transitions.
Not storing process state. If state lives in memory, a restart wipes it out. If state lives only in logs, you can’t reliably continue.
Building a fuzzy audit trail. Events have different clocks and formats, so the timeline can’t be trusted during an incident or compliance review.
Mixing async and sync with no single source of truth. One system says “Paid,” another says “Pending,” and nobody knows which is correct.

A simple example: an expense report is approved in chat, a webhook arrives late, and the payment API gets retried. Without stored state and idempotency, the retry can send the payment twice, and your records won’t clearly explain why.

Most fixes come down to being explicit:

Persist state transitions (Requested, Approved, Rejected, Paid) in a database, with who/what changed them.
Use idempotency keys for every external side effect (payments, emails, tickets) and store the result.
Separate “accept the request” from “finish the work”: return quickly, then complete the workflow in the background.
Standardize timestamps (UTC), add correlation IDs, and record both the request and the outcome.

Quick checklist before you build

Store process state properly

Set up PostgreSQL data models for requests, decisions, deadlines, and correlation IDs.

Model Data

Long-running work is less about one perfect call and more about staying correct after delays, people, and failures.

Write down what “safe to continue” means for your process. If the app restarts mid-way, you should be able to pick up from the last known step without guessing.

A practical checklist:

Define how the process resumes after a crash or deploy. What state is saved, and what runs next?
Give every instance a unique process key (like ExpenseRequest-10482) and a clear status model (Submitted, Waiting for Manager, Approved, Paid, Failed).
Treat approvals as records, not just outcomes: who approved or rejected, when, and the reason or comment.
Map waiting rules: reminders, deadlines, escalations, expirations. Name an owner for each timer (manager, finance, system).
Plan failure handling: retries must be limited and safe, and there should be a “needs review” stop where a person can fix data or approve a reattempt.

A sanity test: imagine a payment provider times out after you already charged the card. Your design should prevent charging twice, while still letting the process finish.

Example: expense approval with deadline and payment retry

Keep a clean code path

Generate real source code when you need full control over hosting or review.

Export Code

Scenario: an employee submits a $120 taxi receipt for reimbursement. It needs manager approval within 48 hours. If approved, the system pays out to the employee. If payment fails, it retries safely and leaves a clear record.

Request-response walkthrough

With request-response APIs, the app often behaves like a conversation that has to keep checking back.

The employee taps Submit. The server creates a reimbursement record with status “Pending approval” and returns an ID. The manager gets a notification, but the employee app usually has to poll to see if anything changed, for example: “GET reimbursement status by ID.”

To enforce the 48-hour deadline, you either run a scheduled job that scans for overdue requests, or you store a deadline timestamp and check it during polls. If the job is delayed, users can see stale status.

When the manager approves, the server flips the status to “Approved” and calls the payment provider. If Stripe returns a temporary error, the server has to decide whether to retry now, retry later, or fail. Without careful idempotency keys, a retry can create a double payout.

Event-driven walkthrough

In an event-driven model, each change is a recorded fact.

The employee submits, producing an “ExpenseSubmitted” event. A workflow starts and waits for either “ManagerApproved” or a “DeadlineReached” timer event at 48 hours. If the timer fires first, the workflow records an “AutoRejected” outcome and why.

On approval, the workflow records “PayoutRequested” and attempts payment. If Stripe times out, it records “PayoutFailed” with an error code, schedules a retry (for example, in 15 minutes), and only records “PayoutSucceeded” once using an idempotency key.

What the user sees stays simple:

Pending approval (48 hours left)
Approved, paying out
Payment retry scheduled
Paid

The audit trail reads like a timeline: submitted, approved, deadline checked, payout attempted, failed, retried, paid.

Next steps: turning the model into a working app

Pick one real process and build it end to end before you generalize. Expense approval, onboarding, and refund handling are good starters because they include human steps, waiting, and failure paths. Keep the goal small: one happy path and the two most common exceptions.

Write the process as states and events, not as screens. For example: “Submitted” -> “ManagerApproved” -> “PaymentRequested” -> “Paid,” with branches like “ApprovalRejected” or “PaymentFailed.” When you see the waiting points and side effects clearly, the choice between event-driven workflows vs request-response APIs becomes practical.

Decide where process state lives. A database can be enough if the flow is simple and you can enforce updates in one place. A workflow engine helps when you need timers, retries, and branching, because it tracks what should happen next.

Add audit fields from day one. Store who did what, when it happened, and why (comment or reason code). When someone asks, “Why was this payment retried?” you want a clear answer without digging through logs.

If you’re building this kind of workflow in a no-code platform, AppMaster (appmaster.io) is one option where you can model data in PostgreSQL and build process logic visually, which can make approvals and audit trails easier to keep consistent across web and mobile apps.

FAQ

Use request-response when the work finishes quickly and predictably while the user waits, like creating a record or validating a form. Use an event-driven workflow when the process spans minutes to days, includes human approvals, or needs timers, retries, and safe resume after restarts.

Long tasks don’t fit a single HTTP request because connections time out, servers restart, and the work often depends on people or external systems. If you treat it like one call, you usually lose state, create duplicates on retries, and end up with scattered background scripts to handle waiting.

A good default is to persist a clear process state in your database and advance it only through explicit transitions. Store the process instance ID, current status, who can act next, and the key timestamps so you can resume safely after deploys, crashes, or delays.

Model approvals as a paused step that resumes when a decision arrives, rather than blocking or constantly polling. Record each decision as data (who decided, when, approve/reject, and reason) so the workflow can move forward predictably and you can audit it later.

Polling can work for simple cases, but it adds noise and delays because the client has to keep asking “is it done yet?” A better default is to push a notification on change and let the client refresh on demand, while the server remains the source of truth for state.

Treat time as part of the process by storing deadlines and reminder times, then re-checking current state when a timer fires before acting. This avoids sending reminders after something was already approved and keeps escalations consistent even if jobs run late or twice.

Start with idempotency keys for any side effect like charging a card or sending an email, and store the outcome for that key. Then retries become safe because repeating the same intent returns the same result instead of doing the action again.

Assume messages can be delivered more than once and design consumers to deduplicate. A practical approach is to store the event ID (or a business key for the step) and ignore repeats so a replay doesn’t trigger the same action twice.

Capture a timeline of facts: actor, timestamp, input at the time, the outcome, and the rule or policy version used. Also assign a single case or correlation ID to everything related to that process so support can answer “where is it stuck?” without digging through unrelated logs.

Keep one request record as the “case,” store decisions separately, and drive state changes through persisted transitions that can be replayed. In a no-code tool like AppMaster, you can model the data in PostgreSQL and implement the step logic visually, which helps keep approvals, retries, and audit fields consistent across the app.