Go worker pools vs goroutine-per-task for background jobs
Go worker pools vs goroutine-per-task: learn how each model affects throughput, memory use, and backpressure for background processing and long-running workflows.

What problem are we solving?
Most Go services do more than answer HTTP requests. They also run background work: send emails, resize images, generate invoices, sync data, process events, or rebuild a search index. Some jobs are quick and independent. Others form long workflows where each step depends on the last (charge a card, wait for confirmation, then notify the customer and update reporting).
When people compare "Go worker pools vs goroutine-per-task", they’re usually trying to solve one production problem: how to run a lot of background work without making the service slow, expensive, or unstable.
You feel the impact in a few places:
- Latency: background work steals CPU, memory, database connections, and network bandwidth from user-facing requests.
- Cost: uncontrolled concurrency pushes you toward bigger machines, more database capacity, or higher queue and API bills.
- Stability: bursts (imports, marketing sends, retry storms) can trigger timeouts, OOM crashes, or cascading failures.
The real tradeoff is simplicity vs control. Spawning a goroutine per task is easy to write and often fine when volume is low or naturally limited. A worker pool adds structure: fixed concurrency, clearer limits, and a natural place to put timeouts, retries, and metrics. The cost is extra code and a decision about what happens when the system is busy (do tasks wait, get rejected, or get stored elsewhere?).
This is about day-to-day background processing: throughput, memory, and backpressure (how you prevent overload). It doesn’t try to cover every queue technology, distributed workflow engines, or exactly-once semantics.
If you’re building full apps with background logic using a platform like AppMaster, the same questions show up quickly. Your business processes and integrations still need limits around databases, external APIs, and email/SMS providers so one busy workflow doesn’t slow everything else down.
Two common patterns in plain terms
Goroutine-per-task
This is the simplest approach: whenever a job arrives, start a goroutine to handle it. The “queue” is often whatever triggers the work, such as a channel receiver or a direct call from an HTTP handler.
A typical shape is: receive a job, then go handle(job). Sometimes a channel is still involved, but only as a handoff point, not a limiter.
It tends to work well when jobs mostly wait on I/O (HTTP calls, database queries, uploads), job volume is modest, and bursts are small or predictable.
The downside is that concurrency can grow without a clear cap. That can spike memory, open too many connections, or overload a downstream service.
Worker pool
A worker pool starts a fixed number of worker goroutines and feeds them jobs from a queue, usually an in-memory buffered channel. Each worker loops: take a job, process it, repeat.
The key difference is control. The number of workers is a hard concurrency limit. If jobs arrive faster than workers can finish them, jobs wait in the queue (or get rejected if the queue is full).
Worker pools are a good fit when work is CPU-heavy (image processing, report generation), when you need predictable resource usage, or when you must protect a database or third-party API from bursts.
Where the queue lives
Both patterns can use an in-memory channel, which is fast but disappears on restart. For “must not lose” jobs or long workflows, the queue often moves outside the process (a database table, Redis, or a message broker). In that setup, you still choose between goroutine-per-task and worker pools, but now they run as consumers of the external queue.
As a simple example, if the system suddenly needs to send 10,000 emails, goroutine-per-task can try to fire them all at once. A pool can send 50 at a time and keep the rest waiting in a controlled way.
Throughput: what changes and what does not
It’s common to expect a big throughput difference between worker pools and goroutine-per-task. Most of the time, raw throughput is limited by something else, not by how you start goroutines.
Throughput usually hits a ceiling at the slowest shared resource: database or external API limits, disk or network bandwidth, CPU-heavy work (JSON/PDF/image resizing), locks and shared state, or downstream services that slow under load.
If a shared resource is the bottleneck, launching more goroutines doesn’t finish the work faster. It mostly creates more waiting at the same choke point.
Goroutine-per-task can win when tasks are short, mostly I/O bound, and don’t contend on shared limits. Goroutine startup is cheap, and Go schedules large numbers of them well. In a “fetch, parse, write one row” style loop, this can keep CPUs busy and hide network latency.
Worker pools win when you need to bound expensive resources. If each job holds a DB connection, opens files, allocates large buffers, or hits an API quota, fixed concurrency keeps the service stable while still reaching the maximum safe throughput.
Latency (especially p99) is where the difference often shows up. Goroutine-per-task can look great at low load, then fall off a cliff when too many tasks pile up. Pools introduce queueing delay (jobs waiting for a free worker), but the behavior is steadier because you avoid a thundering herd fighting over the same limit.
A simple mental model:
- If work is cheap and independent, more concurrency can raise throughput.
- If work is gated by a shared limit, more concurrency mostly raises waiting.
- If you care about p99, measure queue time separately from processing time.
Memory and resource use
A lot of the worker-pool vs goroutine-per-task debate is really about memory. CPU can often be scaled up or out. Memory failures are more sudden and can take the whole service down.
A goroutine is cheap, but not free. Each one starts with a small stack that grows as it calls deeper functions or holds large local variables. There’s also scheduler and runtime bookkeeping. Ten thousand goroutines can be fine. A hundred thousand can be a surprise if each one keeps references to large job data.
The bigger hidden cost is often not the goroutine itself, but what it keeps alive. If tasks arrive faster than they finish, goroutine-per-task creates an unbounded backlog. The “queue” might be implicit (goroutines waiting on locks or I/O) or explicit (a buffered channel, a slice, an in-memory batch). Either way, memory grows with the backlog.
Worker pools help because they force a cap. With fixed workers and a bounded queue, you get a real memory limit and a clear failure mode: once the queue is full, you block, shed load, or push back upstream.
A quick back-of-the-envelope check:
- Peak goroutines = workers + in-flight jobs + “waiting” jobs you created
- Memory per job = payload (bytes) + metadata + anything referenced (requests, decoded JSON, DB rows)
- Peak backlog memory ~= waiting jobs * memory per job
Example: if each job holds a 200 KB payload (or references a 200 KB object graph) and you allow 5,000 jobs to pile up, that’s about 1 GB just for payloads. Even if goroutines were magically free, the backlog isn’t.
Backpressure: keeping the system from melting down
Backpressure is simple: when work arrives faster than you can finish it, the system pushes back in a controlled way instead of quietly piling up. Without it, you don’t just get slower. You get timeouts, memory growth, and failures that are hard to reproduce.
You usually notice missing backpressure when a burst (imports, emails, exports) triggers patterns like memory climbing and not dropping, queue time growing while CPU stays busy, latency spikes for unrelated requests, retries piling up, or errors like “too many open files” and connection pool exhaustion.
A practical tool is a bounded channel: cap how many jobs can wait. Producers block when the channel is full, which slows job creation at the source.
Blocking isn’t always the right choice. For optional work, choose an explicit policy so overload is predictable:
- Drop low-value tasks (for example, duplicate notifications)
- Batch many small tasks into one write or one API call
- Delay work with jitter to avoid retry spikes
- Defer to a persistent queue and return quickly
- Shed load with a clear error when already overloaded
Rate limiting and timeouts are also backpressure tools. Rate limiting caps how fast you hit a dependency (email provider, database, third-party API). Timeouts cap how long a worker can be stuck. Together, they stop a slow dependency from turning into a full outage.
Example: month-end statement generation. If 10,000 requests hit at once, unlimited goroutines can trigger 10,000 PDF renders and uploads. With a bounded queue and fixed workers, you render and retry at a safe pace.
How to build a worker pool step by step
A worker pool caps concurrency by running a fixed number of workers and feeding them jobs from a queue.
1) Pick a safe concurrency limit
Start with what your jobs spend time on.
- For CPU-heavy work, keep workers close to your CPU core count.
- For I/O-heavy work (DB, HTTP, storage), you can go higher, but stop when dependencies start timing out or throttling.
- For mixed work, measure and adjust. A reasonable starting range is often 2x to 10x CPU cores, then tune.
- Respect shared limits. If the DB pool is 20 connections, 200 workers will just fight over those 20.
2) Choose the queue and set its size
A buffered channel is common because it’s built in and easy to reason about. The buffer is your shock absorber for bursts.
Small buffers surface overload quickly (senders block sooner). Larger buffers smooth spikes but can hide trouble and increase memory and latency. Size the buffer on purpose and decide what happens when it fills.
3) Make every task cancelable
Pass a context.Context into each job and make sure the job code uses it (DB, HTTP). This is how you stop cleanly on deploys, shutdowns, and timeouts.
func StartPool(ctx context.Context, workers, queueSize int, handle func(context.Context, Job) error) chan<- Job {
jobs := make(chan Job, queueSize)
for i := 0; i < workers; i++ {
go func() {
for {
select {
case <-ctx.Done():
return
case j := <-jobs:
_ = handle(ctx, j)
}
}
}()
}
return jobs
}
4) Add the metrics you will actually use
If you only track a few numbers, make it these:
- Queue depth (how far behind you are)
- Worker busy time (how saturated the pool is)
- Task duration (p50, p95, p99)
- Error rate (and retry counts if you retry)
That’s enough to tune worker count and queue size based on evidence, not guesses.
Common mistakes and traps
Most teams don’t get hurt by choosing the “wrong” pattern. They get hurt by small defaults that turn into outages when traffic spikes.
When goroutines multiply
The classic trap is spawning one goroutine per job under a burst. A few hundred is fine. A few hundred thousand can flood the scheduler, heap, logs, and network sockets. Even if each goroutine is small, the total cost adds up, and recovery takes time because the work is already in flight.
Another mistake is treating a huge buffered channel as “backpressure.” A large buffer is just a hidden queue. It can buy time, but it also hides problems until you hit a memory wall. If you need a queue, size it deliberately and decide what happens when it’s full (block, drop, retry later, or persist to storage).
Hidden bottlenecks
Many background jobs aren’t CPU bound. They’re limited by something downstream. If you ignore those limits, a fast producer overwhelms a slow consumer.
Common traps:
- No cancellation or timeout, so workers can block forever on an API request or DB query
- Worker counts chosen without checking real limits like DB connections, disk I/O, or third-party rate caps
- Retries that amplify load (immediate retries across 1,000 failed jobs)
- One shared lock or single transaction that serializes everything, so “more workers” only adds overhead
- Missing visibility: no metrics for queue depth, job age, retry count, and worker utilization
Example: a nightly export triggers 20,000 “send notification” tasks. If each task hits your database and an email provider, it’s easy to exceed connection pools or quotas. A pool of 50 workers with per-task timeouts and a small queue makes the limit obvious. One goroutine per task plus a giant buffer makes the system look fine until it suddenly isn’t.
Example: bursty exports and notifications
Picture a support team that needs data for an audit. One person clicks an "Export" button, then a few teammates do the same, and suddenly you have 5,000 export jobs created within a minute. Each export reads from the database, formats a CSV, stores a file, and sends a notification (email or Telegram) when it’s ready.
With a goroutine-per-task approach, the system feels great for a moment. All 5,000 jobs start almost instantly, and it looks like the queue is draining fast. Then the costs show up: thousands of concurrent database queries compete for connections, memory climbs as jobs hold buffers at the same time, and timeouts become common. Jobs that could have finished quickly get stuck behind retries and slow queries.
With a worker pool, the start is slower but the overall run is calmer. With 50 workers, only 50 exports do heavy work at once. Database usage stays in a range you can predict, buffers get reused more often, and latency is steadier. Total completion time is easier to estimate too: roughly (jobs / workers) * average job duration, plus some overhead.
The key difference isn’t that pools are magically faster. It’s that they stop the system from hurting itself during bursts. A controlled 50-at-a-time run often finishes sooner than 5,000 jobs fighting each other.
Where you apply backpressure depends on what you want to protect:
- At the API layer, reject or delay new export requests when the system is busy.
- At the queue, accept requests but enqueue jobs and drain them at a safe rate.
- In the worker pool, cap concurrency for the expensive parts (DB reads, file generation, notification sending).
- Per resource, split into separate limits (for example, 40 workers for exports but only 10 for notifications).
- On external calls, rate-limit email/SMS/Telegram so you don’t get blocked.
Quick checklist before shipping
Before you run background jobs in production, do a pass on limits, visibility, and failure handling. Most incidents aren’t caused by “slow code.” They come from missing guardrails when load spikes or a dependency gets flaky.
- Set hard max concurrency per dependency. Don’t pick one global number and hope it fits everything. Cap DB writes, outbound HTTP calls, and CPU-heavy work separately.
- Make the queue bounded and observable. Put a real limit on pending jobs and expose a few metrics: queue depth, age of the oldest job, and processing rate.
- Add retries with jitter and a dead-letter path. Retry selectively, spread retries out, and after N failures move the job to a dead-letter queue or “failed” table with enough detail to review and replay.
- Verify shutdown behavior: drain, cancel, resume safely. Decide what happens on deploy or crash. Make jobs idempotent so reprocessing is safe, and store progress for long workflows.
- Protect the system with timeouts and circuit breakers. Every external call needs a timeout. If a dependency is down, fail fast (or pause intake) instead of stacking work.
Practical next steps
Choose the pattern that matches what your system looks like on a normal day, not a perfect day. If work arrives in bursts (uploads, exports, email blasts), a fixed worker pool with a bounded queue is usually the safer default. If work is steady and each task is small, goroutine-per-task can be fine, as long as you still enforce limits somewhere.
The winning choice is usually the one that makes failure boring. Pools make limits obvious. Goroutine-per-task makes it easy to forget limits until the first real spike.
Start simple, then add bounds and visibility
Start with something straightforward, but add two controls early: a bound on concurrency and a way to see queueing and failures.
A practical rollout plan:
- Define your workload shape: bursty, steady, or mixed (and what “peak” looks like).
- Put a hard cap on in-flight work (pool size, semaphore, or bounded channel).
- Decide what happens when the cap is hit: block, drop, or return a clear error.
- Add basic metrics: queue depth, time-in-queue, processing time, retries, and dead letters.
- Load test with a burst that’s 5x your expected peak and watch memory and latency.
When a pool is not enough
If workflows can run for minutes to days, a simple pool can struggle because work isn’t just “do it once.” You need state, retries, and resumability. That usually means persisting progress, using idempotent steps, and applying backoff. It can also mean splitting one big job into smaller steps so you can resume safely after a crash.
If you want to ship a full backend with workflows faster, AppMaster (appmaster.io) can be a practical option: you model data and business logic visually, and it generates real Go code for the backend so you can keep the same discipline around concurrency limits, queueing, and backpressure without wiring everything by hand.
FAQ
Default to a worker pool when jobs can arrive in bursts or touch shared limits like DB connections, CPU, or external API quotas. Use goroutine-per-task when volume is modest, tasks are short, and you still have a clear limit somewhere (like a semaphore or rate limiter).
Starting a goroutine per task is fast to write and can have great throughput at low load, but it can create an unbounded backlog under spikes. A worker pool adds a hard concurrency cap and a clear place to apply timeouts, retries, and metrics, which usually makes production behavior more predictable.
Usually not much. In most systems, throughput is capped by a shared bottleneck such as the database, an external API, disk I/O, or CPU-heavy steps. More goroutines won’t beat that limit; they mostly increase waiting and contention.
Goroutine-per-task often has better latency at low load, then can get much worse at high load because everything competes at once. A pool can add queueing delay, but it tends to keep p99 steadier by preventing a thundering herd on the same dependencies.
The goroutine itself is usually not the biggest cost; the backlog is. If tasks pile up and each task holds onto job payloads or large objects, memory can climb quickly. A worker pool plus a bounded queue turns that into a defined memory ceiling and a predictable overload behavior.
Backpressure means you slow down or stop accepting new work when the system is already busy, instead of letting work pile up invisibly. A bounded queue is a simple form: when full, producers block or you return an error, which prevents runaway memory and connection exhaustion.
Start from the real limit. For CPU-heavy jobs, begin near the number of CPU cores. For I/O-heavy jobs, you can go higher, but stop increasing when your database, network, or third-party APIs start timing out or throttling, and make sure you respect connection pool sizes.
Pick a size that absorbs normal bursts but doesn’t hide trouble for minutes. Small buffers expose overload quickly; large buffers can increase memory usage and make users wait longer before failures show up. Decide upfront what happens when the queue is full: block, reject, drop, or persist elsewhere.
Use context.Context per job and ensure database and HTTP calls respect it. Set timeouts on external calls, and make shutdown behavior explicit so workers can stop cleanly without leaving hung goroutines or half-finished work.
Track queue depth, time spent waiting in the queue, task duration (p50/p95/p99), and error/retry counts. These metrics tell you whether you need more workers, a smaller queue, tighter timeouts, or stronger rate limiting against a dependency.


