Error taxonomy for business apps: consistent UI and monitoring
Error taxonomy for business apps helps you classify validation, auth, rate limits, and dependency failures so alerts and UI responses stay consistent.

What an error taxonomy solves in real business apps
An error taxonomy is a shared way to name and group errors so everyone handles them the same way. Instead of every screen and API inventing its own messages, you define a small set of categories (like validation or auth) and rules for how they show up to users and in monitoring.
Without that shared structure, the same problem appears in different forms. A missing required field might show as âBad Requestâ on mobile, âSomething went wrongâ on web, and a stack trace in logs. Users donât know what to do next, and on-call teams waste time guessing whether itâs user error, an attack, or an outage.
The goal is consistency: the same type of error leads to the same UI behavior and the same alerting behavior. Validation issues should point to the exact field. Permission issues should stop the action and explain what access is missing. Dependency failures should offer a safe retry, while monitoring raises the right alarm.
A realistic example: a sales rep tries to create a customer record, but the payment service is down. If your app returns a generic 500, theyâll retry and may create duplicates later. With a clear dependency-failure category, the UI can say the service is temporarily unavailable, prevent duplicate submissions, and monitoring can page the right team.
This kind of alignment matters most when one backend powers multiple clients. If the API, web app, mobile app, and internal tools all rely on the same categories and codes, failures stop feeling random.
A simple model: category, code, message, details
Taxonomies stay easier to maintain when you separate four things that often get mixed together: the category (what kind of problem it is), the code (a stable identifier), the message (human text), and the details (structured context). HTTP status still matters, but it shouldnât be the whole story.
Category answers: âHow should the UI and monitoring behave?â A 403 might mean âauthâ in one place, while another 403 could be âpolicyâ if you later add rules. Category is about behavior, not transport.
Code answers: âWhat exactly happened?â Codes should be stable and boring. If you rename a button or refactor a service, the code shouldnât change. Dashboards, alerts, and support scripts depend on this.
Message answers: âWhat do we tell a person?â Decide who the message is for. A user-facing message should be short and kind. A support message can include next steps. Logs can be more technical.
Details answers: âWhat do we need to fix it?â Keep details structured so the UI can react. For a form error, that might be field names. For a dependency issue, that might be an upstream service name and a retry-after value.
Hereâs a compact shape many teams use:
{
"category": "validation",
"code": "CUSTOMER_EMAIL_INVALID",
"message": "Enter a valid email address.",
"details": { "field": "email", "rule": "email" }
}
As features change, keep categories small and stable, and add new codes instead of reusing old ones. That keeps UI behavior, monitoring trends, and support playbooks reliable as the product evolves.
Core categories: validation, auth, rate limits, dependencies
Most business apps can start with four categories that show up everywhere. If you name and treat them the same way across backend, web, and mobile, your UI can respond consistently and your monitoring becomes readable.
Validation (expected)
Validation errors happen when user input or a business rule fails. These are normal and should be easy to fix: missing required fields, invalid formats, or rules like âdiscount canât exceed 20%â or âorder total must be > $0â. The UI should highlight the exact field or rule, not show a generic alert.
Authentication vs authorization (expected)
Auth errors usually split into two cases: not authenticated (not logged in, session expired, token missing) and not authorized (logged in, but lacks permission). Treat them differently. âPlease sign in againâ fits the first case. For the second, avoid revealing sensitive details, but still be clear: âYou donât have access to approve invoices.â
Rate limits (expected, but time-based)
Rate limiting means âtoo many requests, try again later.â It often appears during imports, busy dashboards, or repeated retries. Include a retry-after hint (even if itâs just âwait 30 secondsâ), and have the UI back off instead of hammering the server.
Dependency failures (often unexpected)
Dependency failures come from upstream services, timeouts, or outages: payment providers, email/SMS, databases, or internal services. Users canât fix these, so the UI should offer a safe fallback (save a draft, try later, contact support).
The key difference is behavior: expected errors are part of normal flow and deserve precise feedback; unexpected errors signal instability and should trigger alerts, correlation IDs, and careful logging.
Step by step: build your taxonomy in one workshop
A taxonomy should be small enough to remember, but strict enough that two teams label the same problem the same way.
1) Timebox and pick a small set
Start with a 60 to 90 minute workshop. List the errors you see most (bad input, login problems, too many requests, third-party outages, unexpected bugs), then collapse them into 6 to 12 categories that everyone can say out loud without checking a doc.
2) Agree on a stable code scheme
Pick a naming pattern that stays readable in logs and tickets. Keep it short, avoid version numbers, and treat codes as permanent once released. A common pattern is a category prefix plus a clear slug, like AUTH_INVALID_TOKEN or DEP_PAYMENT_TIMEOUT.
Before you leave the room, decide what every error must include: category, code, safe message, structured details, and a trace or request ID.
3) Write one rule for category vs code
Teams get stuck when categories become a dumping ground. A simple rule helps: category answers âHow should the UI and monitoring react?â, code answers âWhat exactly happened?â. If two failures need different UI behavior, they shouldnât share a category.
4) Set default UI behavior per category
Decide what users see by default. Validation highlights fields. Auth sends users to sign-in or shows an access message. Rate limits show âtry again in X secondsâ. Dependency failures show a calm retry screen. Once these defaults exist, new features can follow them instead of inventing one-off handling.
5) Test with real scenarios
Run five common flows (signup, checkout, search, admin edit, file upload) and label every failure. If the group argues, you usually need one clearer rule, not twenty more codes.
Validation errors: make them actionable for users
Validation is the one type of failure you usually want to show immediately. It should be predictable: it tells the user what to fix, and it never triggers a retry loop.
Field-level and form-level validation are different problems. Field-level errors map to one input (email, phone, amount). Form-level errors are about the combination of inputs (start date must be before end date) or missing prerequisites (no shipping method selected). Your API response should make that difference clear so the UI can react correctly.
A common business rule failure is âcredit limit exceeded.â The user may have entered a valid number, but the action isnât allowed based on account status. Treat this as a form-level validation error with a clear reason and a safe hint, like âYour available limit is $500. Reduce the amount or request an increase.â Avoid exposing internal names like database fields, scoring models, or rule engine steps.
An actionable response usually includes a stable code (not just an English sentence), a user-friendly message, optional field pointers for field-level issues, and small safe hints (format examples, allowed ranges). If you need a rule name for engineers, put it in logs, not in UI.
Log validation failures differently from system errors. You want enough context to debug patterns without storing sensitive data. Record user ID, request ID, the rule name or code, and which fields failed. For values, log only what you need (often âpresent/missingâ or length) and mask anything sensitive.
In the UI, focus on fixing, not retrying. Highlight fields, keep what the user typed, scroll to the first error, and disable automatic retries. Validation errors arenât temporary, so âtry againâ wastes time.
Auth and permission errors: keep security and clarity
Authentication and authorization failures look similar to users, but they mean different things for security, UI flow, and monitoring. Separating them makes behavior consistent across web, mobile, and API clients.
Unauthenticated means the app canât prove who the user is. Typical causes are missing credentials, an invalid token, or an expired session. Forbidden means the user is known, but not allowed to do the action.
Session expired is the most common edge case. If you support refresh tokens, try a silent refresh once, then retry the original request. If refresh fails, return an unauthenticated error and send the user to sign in again. Avoid loops: after one refresh attempt, stop and surface a clear next step.
UI behavior should stay predictable:
- Unauthenticated: prompt sign-in and preserve what the user was trying to do
- Forbidden: stay on the page and show an access message, plus a safe action like ârequest accessâ
- Account disabled or revoked: sign out and show a short message that support can help
For auditing, log enough to answer âwho tried what and why was it blockedâ without exposing secrets. A useful record includes user ID (if known), tenant or workspace, action name, resource identifier, timestamp, request ID, and the policy check result (allowed/denied). Keep raw tokens and passwords out of logs.
In user-facing messages, donât reveal role names, permission rules, or internal policy structure. âYou donât have access to approve invoicesâ is safer than âOnly FinanceAdmin can approve invoices.â
Rate limit errors: predictable behavior under load
Rate limits arenât bugs. Theyâre a safety rail. Treat them as a first-class category so the UI, logs, and alerts react consistently when traffic jumps.
Rate limits usually show up in a few shapes: per user (one person clicking too fast), per IP (many users behind one office network), or per API key (a single integration job running wild). The cause matters because the fix is different.
What a good rate-limit response includes
Clients need two things: that theyâre limited, and when to try again. Return HTTP 429 plus a clear wait time (for example, Retry-After: 30). Also include a stable error code (like RATE_LIMITED) so dashboards can group events.
Keep the message calm and specific. âToo many requestsâ is technically true but not helpful. âPlease wait 30 seconds and try againâ sets expectations and reduces repeated clicks.
On the UI side, prevent rapid retries. A simple pattern is disabling the action for the wait period, showing a short countdown, then offering one safe retry when the timer ends. Avoid wording that makes users think data was lost.
Monitoring is where teams often overreact. Donât page someone for every 429. Track rates and alert on unusual spikes: a sudden jump for one endpoint, tenant, or API key is actionable.
Backend behavior should also be predictable. Use exponential backoff for automatic retries, and make retries idempotent. A âCreate invoiceâ action shouldnât create two invoices if the first request actually succeeded.
Dependency failures: handle outages without chaos
Dependency failures are the ones users canât fix with better input. A user did everything right, but a payment gateway timed out, a database connection dropped, or an upstream service returned a 5xx. Treat these as a separate category so both the UI and monitoring behave predictably.
Start by naming the common shapes of failure: timeout, connection error (DNS, TLS, refused), and upstream 5xx (bad gateway, service unavailable). Even if you canât know the root cause, you can capture what happened and respond consistently.
Retry vs fail fast
Retries help for short hiccups, but they can also make an outage worse. Use simple rules so every team makes the same call.
- Retry when the error is likely temporary: timeouts, connection resets, 502/503
- Fail fast for user-caused or permanent cases: 4xx from the dependency, invalid credentials, missing resource
- Cap retries (for example 2 to 3 attempts) and add a small backoff
- Never retry non-idempotent actions unless you have an idempotency key
UI behavior and safe fallbacks
When a dependency fails, say what the user can do next without blaming them: âTemporary issue. Please try again.â If thereâs a safe fallback, offer it. Example: if Stripe is down, let the user save the order as âPending paymentâ and send an email confirmation instead of losing the cart.
Also protect users from double submits. If the user taps âPayâ twice during a slow response, your system should detect it. Use idempotency keys for create-and-charge flows, or state checks like âorder already paidâ before running the action again.
For monitoring, log fields that answer one question fast: âWhich dependency is failing, and how bad is it?â Capture dependency name, endpoint or operation, duration, and the final outcome (timeout, connect, upstream 5xx). This makes alerts and dashboards meaningful instead of noisy.
Make monitoring and UI consistent across channels
Taxonomies only work when every channel speaks the same language: the API, the web UI, the mobile app, and your logs. Otherwise, the same problem shows up as five different messages, and nobody knows whether itâs user error or a real outage.
Treat HTTP status codes as a secondary layer. They help with proxies and basic client behavior, but your category and code should carry the meaning. A dependency timeout might still be a 503, but the category tells the UI to offer âTry againâ and tells monitoring to page the on-call.
Make every API return one standard error shape, even when the source is different (database, auth module, third-party API). A simple shape like this keeps UI handling and dashboards consistent:
{
"category": "dependency",
"code": "PAYMENTS_TIMEOUT",
"message": "Payment service is not responding.",
"details": {"provider": "stripe"},
"correlation_id": "9f2c2c3a-6a2b-4a0a-9e9d-0b0c0c8b2b10"
}
Correlation IDs are the bridge between âa user saw an errorâ and âwe can trace it.â Show the correlation_id in the UI (a copy button helps), and always log it on the backend so you can follow one request across services.
Agree on whatâs safe to show in UI vs only in logs. A practical split is: UI gets category, a clear message, and a next step; logs get technical error details and request context; both share correlation_id and the stable error code.
Quick checklist for a consistent error system
Consistency is boring in the best way: every channel behaves the same, and monitoring tells the truth.
Check the backend first, including background jobs and webhooks. If any field is optional, people will skip it and consistency will break.
- Every error includes a category, a stable code, a user-safe message, and a trace ID.
- Validation problems are expected, so they donât trigger paging alerts.
- Auth and permission issues are tracked for security patterns, but not treated like outages.
- Rate limit responses include a retry hint (for example, seconds to wait) and donât spam alerts.
- Dependency failures include the dependency name plus timeout or status details.
Then check UI rules. Each category should map to one predictable screen behavior so users donât have to guess what to do next: validation highlights fields, auth prompts sign-in or shows access, rate limits show a calm wait, dependency failures offer retry and a fallback when possible.
A simple test is to trigger one error from each category in staging and verify you get the same result in the web app, mobile app, and admin panel.
Common mistakes and practical next steps
The fastest way to break an error system is to treat it as an afterthought. Different teams end up using different words, different codes, and different UI behavior for the same problem. Taxonomy work pays off when it stays consistent.
Common failure patterns:
- Leaking internal exception text to users. It confuses people and can expose sensitive details.
- Labeling every 4xx as âvalidation.â Missing permission isnât the same as a missing field.
- Inventing new codes per feature without review. You end up with 200 codes that mean the same 5 things.
- Retrying the wrong failures. Retrying a permission error or a bad email address just creates noise.
A simple example: a sales rep submits a âCreate customerâ form and gets a 403. If the UI treats all 4xx as validation, it will highlight random fields and ask them to âfix inputsâ instead of telling them they need access. Monitoring then shows a spike in âvalidation issuesâ when the real issue is roles.
Practical next steps that fit in one short workshop: write a one-page taxonomy doc (categories, when to use them, 5 to 10 canonical codes), define message rules (what users see vs what goes into logs), add a lightweight review gate for new codes, set retry rules by category, then implement end-to-end (backend response, UI mapping, and monitoring dashboards).
If youâre building with AppMaster (appmaster.io), it helps to centralize these rules in one place so the same category and code behavior carries across the backend, web app, and native mobile apps.
FAQ
Start when the same backend serves more than one client (web, mobile, internal tools), or when support and on-call keep asking, âIs this user error or a system issue?â A taxonomy pays off quickly once you have repeated flows like signup, checkout, imports, or admin edits where consistent handling matters.
A good default is 6â12 categories that people can remember without checking docs. Keep categories stable and broad (like validation, auth, rate_limit, dependency, conflict, internal), and express the specific situation with a code, not a new category.
Category drives behavior, code identifies the exact situation. The category tells the UI and monitoring what to do (highlight fields, prompt sign-in, back off, offer retry), while the code stays stable for dashboards, alerts, and support scripts even if the UI text changes.
Treat messages as content, not identifiers. Return a short user-safe message for the UI, and rely on the stable code for grouping and automation. If you need more technical wording, keep it in logs and tie it to the same correlation ID.
Include a category, a stable code, a user-safe message, structured details, and a correlation or request ID. Details should be shaped for the client to act on, like which field failed or how long to wait, without dumping raw exception text.
Return field-level pointers when possible, so the UI can highlight the exact input and keep what the user typed. Use a separate form-level error when the issue is about a combination of inputs or a business rule, like date ranges or credit limits, so the UI doesnât guess the wrong field.
Unauthenticated means the user isnât logged in or the session/token is invalid, so the UI should send them to sign in and preserve their task. Forbidden means they are logged in but lack permission, so the UI should stay put and show an access message without revealing sensitive role or policy details.
Return an explicit wait time (for example, a retry-after value) and keep the code stable so clients can implement backoff consistently. In the UI, disable repeated clicks and show a clear next step, because automatic rapid retries usually make rate limiting worse.
Retry only when the failure is likely temporary (timeouts, connection resets, upstream 502/503) and cap retries with a small backoff. For non-idempotent actions, require an idempotency key or a state check, otherwise a retry can create duplicates when the first attempt actually succeeded.
Show the correlation ID to the user (so support can ask for it) and always log it server-side with the code and key details. This lets you trace one failure across services and clients; in AppMaster projects, centralizing this shape in one place helps keep backend, web, and native mobile behavior aligned.


