Apr 20, 2025·8 min read

Kotlin networking for slow connections: timeouts and safe retries

Q: If I can only set one timeout in OkHttp, which one should it be?

Yes, if you only set one, use `callTimeout` to cap the whole operation end-to-end so you avoid “infinite” waiting. Then layer connect/read/write timeouts as needed for better control, especially for uploads and slow response bodies.

Practical Kotlin networking for slow connections: set timeouts, cache safely, retry without duplicates, and protect critical actions on flaky mobile networks.

What breaks on slow and flaky connections

On mobile, “slow” usually doesn’t mean “no internet.” It often means a connection that works, but only in short bursts. A request might take 8 to 20 seconds, stall halfway, then finish. Or it might succeed one moment and fail the next because the phone switched from Wi-Fi to LTE, entered a low-signal area, or the OS put the app in the background.

“Flaky” is worse. Packets drop, DNS lookups time out, TLS handshakes fail, and connections reset at random. You can do everything “right” in code and still see failures in the field because the network is changing under you.

This is where default settings tend to break. Many apps rely on library defaults for timeouts, retries, and caching without deciding what “good enough” looks like for real people. Defaults are often tuned for stable Wi-Fi and quick APIs, not for a commuter train, an elevator ride, or a busy coffee shop.

Users don’t describe “socket timeouts” or “HTTP 503.” They notice symptoms: endless spinners, sudden errors after a long wait (then it works on the next try), duplicate actions (two bookings, two orders, double charges), lost updates, and mixed states where the UI says “failed” but the server actually succeeded.

Slow networks turn small design gaps into money and trust problems. If the app doesn’t clearly separate “still sending” from “failed” from “done,” users tap again. If the client retries blindly, it can create duplicates. If the server doesn’t support idempotency, one shaky connection can produce multiple “successful” writes.

“Critical actions” are anything that must happen at most once and must be correct: payments, checkout submissions, booking a slot, transferring points, changing a password, saving a shipping address, submitting a claim, or sending an approval.

A realistic example: someone submits checkout on weak LTE. The app sends the request, then the connection drops before the response arrives. The user sees an error, taps “Pay” again, and now two requests reach the server. Without clear rules, the app can’t tell whether it should retry, wait, or stop. The user can’t tell whether they should try again.

Decide your rules before tuning code

When connections are slow or flaky, most bugs come from unclear rules, not from the HTTP client. Before you touch timeouts, caching, or retries, write down what “correct” means for your app.

Start with actions that must never run twice. These are usually money and account actions: place order, charge card, submit payout, change password, delete account. If a user taps twice or the app retries, the server should still treat it as one request. If you can’t guarantee that yet, treat those endpoints as “no auto-retry” until you can.

Next, decide what each screen is allowed to do when the network is bad. Some screens can still be useful offline (last known profile, previous orders). Others should go read-only or show a clear “try again” state (inventory counts, live pricing). Mixing these expectations leads to confusing UI and risky caching.

Set acceptable wait time per action based on how users think, not what feels neat in code. Login can tolerate a short wait. File upload needs longer. Checkout should feel fast but also safe. A 30-second timeout might be “reliable” on paper and still feel broken.

Finally, decide what you will store on the device and for how long. Cached data helps, but stale data can lead to wrong choices (old prices, expired eligibility).

Write the rules somewhere everyone can find them (a README is fine). Keep it simple:

Which endpoints are “must not duplicate” and require idempotency handling?
Which screens must work offline, and which are read-only when offline?
What’s the maximum wait time per action (login, feed refresh, upload, checkout)?
What can be cached on-device, and what’s the expiry time?
After failure, do you show an error, queue for later, or require manual retry?

Once these rules are clear, your timeout values, caching headers, retry policy, and UI states are much easier to implement and test.

Timeouts that match real user expectations

Slow networks fail in different ways. A good timeout setup doesn’t just “pick a number.” It matches what the user is trying to do and fails fast enough that the app can recover.

The three timeouts, in plain terms:

Connect timeout: how long you wait to establish a connection to the server (DNS lookup, TCP, TLS). If this fails, the request never really started.
Write timeout: how long you wait while sending the request body (uploads, large JSON, slow uplink).
Read timeout: how long you wait for the server to send data back after the request is sent. This often shows up on spotty mobile networks.

Timeouts should reflect the screen and the stakes. A feed can be slower without real harm. A critical action should either complete or fail clearly so the user can decide what to do next.

A practical starting point (adjust after measuring):

List loading (low risk): connect 5-10s, read 20-30s, write 10-15s.
Search-as-you-type: connect 3-5s, read 5-10s, write 5-10s.
Critical actions (high risk, like “Pay” or “Submit order”): connect 5-10s, read 30-60s, write 15-30s.

Consistency matters more than perfection. If the user taps “Submit” and sees a spinner for two minutes, they’ll tap again.

Avoid “infinite loading” by adding a clear upper bound in the UI, too. Show progress immediately, allow cancel, and after (say) 20-30 seconds show “Still trying…” with options to retry or check connection. That keeps the experience honest even if the network library is still waiting.

When a timeout happens, log enough to debug patterns later, without logging secrets. Useful fields include the URL path (not full query), HTTP method, status (if any), a timing breakdown (connect vs write vs read if available), network type (Wi-Fi, cellular, airplane mode), approximate request/response size, and a request ID so you can match client logs with server logs.

A simple, consistent Kotlin networking setup

When connections are slow, small inconsistencies in client setup turn into big problems. A clean baseline helps you debug faster and gives every request the same rules.

One client, one policy

Start with a single place where you build your HTTP client (often one OkHttpClient used by Retrofit). Put the basics there so every request behaves the same: default headers (app version, locale, auth token) and a clear User-Agent, timeouts set once (not sprinkled across calls), logging you can enable for debugging, and one retry policy decision (even if it’s “no automatic retries”).

Here is a small example that keeps configuration in one file:

val okHttp = OkHttpClient.Builder()
  .connectTimeout(10, TimeUnit.SECONDS)
  .readTimeout(20, TimeUnit.SECONDS)
  .writeTimeout(20, TimeUnit.SECONDS)
  .callTimeout(30, TimeUnit.SECONDS)
  .addInterceptor { chain ->
    val request = chain.request().newBuilder()
      .header("User-Agent", "MyApp/${BuildConfig.VERSION_NAME}")
      .header("Accept", "application/json")
      .build()
    chain.proceed(request)
  }
  .build()

val retrofit = Retrofit.Builder()
  .baseUrl(BASE_URL)
  .client(okHttp)
  .addConverterFactory(MoshiConverterFactory.create())
  .build()

Central error handling that maps to user messages

Network errors aren’t just “an exception.” If each screen handles them differently, users get random messages.

Create one mapper that converts failures into a small set of user-friendly outcomes: no connection/airplane mode, timeout, server error (5xx), validation or auth error (4xx), and an unknown fallback.

This keeps UI copy consistent (“No connection” vs “Try again”) without leaking technical details.

Tag and cancel requests when screens close

On flaky networks, calls can finish late and update a screen that’s already gone. Make cancellation a standard rule: when a screen closes, its work stops.

With Retrofit and Kotlin coroutines, canceling the coroutine scope (for example in a ViewModel) cancels the underlying HTTP call. For non-coroutine calls, keep a reference to the Call and call cancel(). You can also tag requests and cancel groups of calls when a feature is exited.

Background work shouldn’t depend on the UI

Anything important that must complete (sending a report, syncing a queue, finishing a submission) should run in a scheduler designed for it. On Android, WorkManager is the usual choice because it can retry later and survive app restarts. Keep UI actions lightweight, and hand off longer work to background jobs when it makes sense.

Caching rules that are safe on mobile

Own the source code

Get real source code you can export, review, and deploy where you need.

Generate code

Caching can be a big win on slow connections because it cuts repeat downloads and makes screens feel instant. It can also be a problem if it shows stale data at the wrong time, like an old account balance or an outdated delivery address.

A safe approach is to cache only what a user can tolerate being slightly old, and force fresh checks for anything that affects money, security, or a final decision.

Cache-Control basics you can rely on

Most rules come down to a few headers:

max-age=60: you can reuse the cached response for 60 seconds without asking the server.
no-store: don’t save this response at all (best for tokens and sensitive screens).
must-revalidate: if it’s expired, you must check with the server before using it again.

On mobile, must-revalidate prevents “quietly wrong” data after a temporary offline period. If the user opens the app after a subway ride, you want a fast screen, but you also want the app to confirm what’s still true.

ETag refreshes: fast, cheap, and reliable

For read endpoints, ETag-based validation is often better than long max-age values. The server sends an ETag with the response. Next time, the app sends If-None-Match with that value. If nothing changed, the server replies 304 Not Modified, which is tiny and fast on weak networks.

This works well for product lists, profile details, and settings screens.

A simple rule of thumb:

Cache “read” endpoints with short max-age plus must-revalidate, and support ETag where you can.
Don’t cache “write” endpoints (POST/PUT/PATCH/DELETE). Treat them as always network-bound.
Use no-store for anything sensitive (auth responses, payment steps, private messages).
Cache static assets (icons, public config) longer, because the risk of staleness is low.

Keep caching decisions consistent across the app. Users notice mismatches more than small delays.

Safe retries without making things worse

Make checkout resilient

Prototype a safer payment flow with idempotency-friendly backend logic and a clean client UI.

Build checkout

Retries feel like an easy fix, but they can backfire. Retry the wrong requests and you create extra load, drain battery, and make the app feel stuck.

Start by retrying only failures that are likely temporary. A dropped connection, a read timeout, or a short server outage can succeed on the next try. A bad password, a missing field, or a 404 won’t.

A practical rule set:

Retry timeouts and connection failures.
Retry 502, 503, and sometimes 504.
Don’t retry 4xx (except 408 or 429, if you have a clear wait rule).
Don’t retry requests that already reached the server and might be processing.
Keep retries low (often 1 to 3 attempts).

Backoff + jitter: fewer retry storms

If many users hit the same outage, instant retries can create a wave of traffic that slows recovery. Use exponential backoff (wait longer each time) and add jitter (a small random delay) so devices don’t retry in sync.

For example: wait about 0.5 seconds, then 1 second, then 2 seconds, with a random +/- 20% each time.

Put a cap on total retry time

Without limits, retries can trap users in a spinner for minutes. Pick a maximum total time for the whole operation, including all waits. Many apps aim for 10 to 20 seconds before they stop and show a clear option to try again.

Also match the context. If someone is submitting a form, they want an answer quickly. If a background sync fails, you can retry later.

Never auto-retry non-idempotent actions (like placing an order or sending a payment) unless you have protection such as an idempotency key or a server-side duplicate check. If you can’t guarantee safety, fail clearly and let the user decide what to do next.

Duplicate-prevention for critical actions

On a slow or flaky connection, users tap twice. The OS may retry in the background. Your app may resend after a timeout. If the action is “create something” (place an order, send money, change a password), duplicates can hurt.

Idempotency means the same request should produce the same outcome. If the request is repeated, the server shouldn’t create a second order. It should return the first result again or say “already done.”

Use an idempotency key for each critical attempt

For critical actions, generate a unique idempotency key when the user starts the attempt, and send it with the request (often as a header like Idempotency-Key, or a field in the body).

A practical flow:

Create a UUID idempotency key when the user taps “Pay”.
Save it locally with a tiny record: status = pending, createdAt, request payload hash.
Send the request with the key.
When you get a success response, mark status = done and store the server’s result ID.
If you need to retry, reuse the same key, not a new one.

That “reuse the same key” rule is what stops accidental double charges.

Handle app restarts and offline gaps

If the app is killed mid-request, the next launch still needs to be safe. Store the idempotency key and request state in local storage (for example, a small database row). On restart, either retry with the same key or call a “check status” endpoint using the saved key or server result ID.

On the server side, the contract should be clear: when it receives a duplicate key, it should reject the second attempt or return the original response (same order ID, same receipt). If the server can’t do that yet, client-side duplicate-prevention will never be fully reliable, because the app can’t see what happened after it sent the request.

A good user-facing touch: if an attempt is pending, show “Payment in progress” and disable the button until you get a final result.

UI patterns that reduce accidental resubmits

Ship native apps faster

Create native Android and iOS apps that handle slow connections with consistent states.

Build mobile

Slow connections don’t just break requests. They change how people tap. When the screen freezes for two seconds, many users assume nothing happened and hit the button again. Your UI has to make “one tap” feel reliable even when the network isn’t.

Optimistic UI is safest when the action is reversible or low risk, like starring an item, saving a draft, or marking a message as read. Confirmed UI is better for money, inventory, irreversible deletes, and anything that could create duplicates.

A good default for critical actions is a clear pending state. After the first tap, immediately switch the primary button into a “Submitting…” state, disable it, and show a short line that explains what’s happening.

Patterns that work well on flaky networks:

Disable the primary action after tap and keep it disabled until you have a final result.
Show a visible “Pending” status with details (amount, recipient, item count).
Add a simple “Recent activity” view so users can confirm what they already sent.
If the app is backgrounded, keep the pending state when they return.
Prefer one clear primary button over multiple tap targets on the same screen.

Sometimes the request succeeds but the response is lost. Treat this as a normal outcome, not an error that invites repeated taps. Instead of “Failed, try again,” show “We’re not sure yet” and offer a safe next step like “Check status.” If you can’t check status, keep the pending record locally and tell the user you’ll update when the connection returns.

Make “Try again” explicit and safe. Only show it when you can repeat the request using the same client-side request ID or idempotency key.

Realistic example: a flaky checkout submission

Build for flaky networks

Build a mobile app and backend with clear retry and timeout rules from day one.

Try AppMaster

A customer is on a train with spotty signal. They add items to the cart and tap Pay. The app has to be patient, but it also must not create two orders.

A safe sequence looks like this:

The app creates a client-side attempt ID and sends the checkout request with an idempotency key (for example, a UUID stored with the cart).
The request waits for a clear connect timeout, then a longer read timeout. The train goes into a tunnel, and the call times out.
The app retries once, but only after a short delay and only if it never received a server response.
The server receives the second request and sees the same idempotency key, so it returns the original result instead of creating a new order.
The app shows a final confirmation screen when it gets the success response, even if it came from the retry.

Caching follows strict rules. Product lists, delivery options, and tax tables can be cached for a short time (GET requests). The checkout submission (POST) is never cached. Even if you use an HTTP cache, treat it as read-only help for browsing, not something that can “remember” a payment.

Duplicate-prevention is a mix of network and UI choices. When the user taps Pay, the button is disabled and the screen shows “Submitting order...” with a single Cancel option. If the app loses network, it switches to “Still trying” and keeps the same attempt ID. If the user force-closes and reopens, the app can resume by checking order status using that ID, instead of asking them to pay again.

Quick checklist and next steps

If your app feels “mostly fine” on office Wi-Fi but falls apart on trains, elevators, or rural areas, treat this as a release gate. This work is less about clever code and more about clear rules you can repeat.

Checklist before you ship:

Set timeouts per endpoint type (login, feed, upload, checkout) and test on throttled and high-latency networks.
Retry only where it’s truly safe, and cap it with backoff (a couple of tries for reads, usually none for writes).
Add an idempotency key for every critical write (payments, orders, form submissions) so a retry or double tap can’t create duplicates.
Make caching rules explicit: what can be served stale, what must be fresh, and what should never be cached.
Make states visible: pending, failed, and completed should look different, and the app should remember completed actions after a restart.

If one of these is “we’ll decide later,” you’ll end up with random behavior across screens.

Next steps to make it stick

Write a one-page networking policy: endpoint categories, timeout targets, retry rules, and caching expectations. Enforce it in one place (interceptors, a shared client factory, or a small wrapper) so every team member gets the same behavior by default.

Then do a short duplicate drill. Pick one critical action (like checkout), simulate a frozen spinner, force-close the app, toggle airplane mode, and press the button again. If you can’t prove it’s safe, users will eventually find a way to break it.

If you want to implement the same rules across backend and clients without hand-wiring everything, AppMaster (appmaster.io) can help by generating production-ready backend and native mobile source code. Even then, the key is the policy: define idempotency, retries, caching, and UI states once, and apply them consistently across the whole flow.

FAQ

Start by defining what “correct” looks like for each screen and action, especially anything that must happen at most once like payments or orders. Once the rules are clear, set timeouts, retries, caching, and UI states to match those rules instead of relying on library defaults.

Users usually see endless spinners, errors after a long wait, actions that work on the second try, or duplicate results like two orders or double charges. These are often caused by unclear retry and “pending vs failed” rules, not just bad signal.

Use connect timeout for how long you’ll wait to establish a connection, write timeout for sending the request body (uploads), and read timeout for waiting on the response after sending. A reasonable default is shorter timeouts for low-risk reads and longer read/write timeouts for critical submissions, with a clear UI limit so users aren’t stuck waiting forever.

Yes, if you only set one, use callTimeout to cap the whole operation end-to-end so you avoid “infinite” waiting. Then layer connect/read/write timeouts as needed for better control, especially for uploads and slow response bodies.

Start by retrying only temporary failures like connection drops, DNS issues, and timeouts, and sometimes 502/503/504 responses. Avoid retrying 4xx errors and avoid auto-retrying writes unless you have idempotency protection, because retries can create duplicates.

Use a small number of retries (often 1–3) with exponential backoff and a bit of random jitter so many devices don’t retry at the same time. Also cap the total time spent retrying so the user gets a clear outcome instead of a spinner that lasts minutes.

Idempotency means repeating the same request won’t create a second result, so a double tap or retry won’t double-charge or double-book. For critical actions, send an idempotency key per attempt and reuse it for retries so the server can return the original result instead of creating a new one.

Generate a unique key when the user starts the action, store it locally with a small “pending” record, and send it with the request. If you retry or the app restarts, reuse the same key and either retry safely or check status, so you never turn one user intent into two server writes.

Cache only data that can safely be a bit old, and force fresh checks for money, security, and final decisions. For reads, prefer short freshness plus revalidation and consider ETags; for writes, don’t cache at all, and use no-store for sensitive responses.

Disable the primary button after the first tap, show an immediate “Submitting…” state, and keep a visible pending status that survives backgrounding or restarts. If the response might be lost, don’t push users into repeated taps; instead show uncertainty (“We’re not sure yet”) and offer a safe next step like checking status.