Kotlin networking for slow connections: timeouts and safe retries
Practical Kotlin networking for slow connections: set timeouts, cache safely, retry without duplicates, and protect critical actions on flaky mobile networks.

What breaks on slow and flaky connections
On mobile, âslowâ usually doesnât mean âno internet.â It often means a connection that works, but only in short bursts. A request might take 8 to 20 seconds, stall halfway, then finish. Or it might succeed one moment and fail the next because the phone switched from Wi-Fi to LTE, entered a low-signal area, or the OS put the app in the background.
âFlakyâ is worse. Packets drop, DNS lookups time out, TLS handshakes fail, and connections reset at random. You can do everything ârightâ in code and still see failures in the field because the network is changing under you.
This is where default settings tend to break. Many apps rely on library defaults for timeouts, retries, and caching without deciding what âgood enoughâ looks like for real people. Defaults are often tuned for stable Wi-Fi and quick APIs, not for a commuter train, an elevator ride, or a busy coffee shop.
Users donât describe âsocket timeoutsâ or âHTTP 503.â They notice symptoms: endless spinners, sudden errors after a long wait (then it works on the next try), duplicate actions (two bookings, two orders, double charges), lost updates, and mixed states where the UI says âfailedâ but the server actually succeeded.
Slow networks turn small design gaps into money and trust problems. If the app doesnât clearly separate âstill sendingâ from âfailedâ from âdone,â users tap again. If the client retries blindly, it can create duplicates. If the server doesnât support idempotency, one shaky connection can produce multiple âsuccessfulâ writes.
âCritical actionsâ are anything that must happen at most once and must be correct: payments, checkout submissions, booking a slot, transferring points, changing a password, saving a shipping address, submitting a claim, or sending an approval.
A realistic example: someone submits checkout on weak LTE. The app sends the request, then the connection drops before the response arrives. The user sees an error, taps âPayâ again, and now two requests reach the server. Without clear rules, the app canât tell whether it should retry, wait, or stop. The user canât tell whether they should try again.
Decide your rules before tuning code
When connections are slow or flaky, most bugs come from unclear rules, not from the HTTP client. Before you touch timeouts, caching, or retries, write down what âcorrectâ means for your app.
Start with actions that must never run twice. These are usually money and account actions: place order, charge card, submit payout, change password, delete account. If a user taps twice or the app retries, the server should still treat it as one request. If you canât guarantee that yet, treat those endpoints as âno auto-retryâ until you can.
Next, decide what each screen is allowed to do when the network is bad. Some screens can still be useful offline (last known profile, previous orders). Others should go read-only or show a clear âtry againâ state (inventory counts, live pricing). Mixing these expectations leads to confusing UI and risky caching.
Set acceptable wait time per action based on how users think, not what feels neat in code. Login can tolerate a short wait. File upload needs longer. Checkout should feel fast but also safe. A 30-second timeout might be âreliableâ on paper and still feel broken.
Finally, decide what you will store on the device and for how long. Cached data helps, but stale data can lead to wrong choices (old prices, expired eligibility).
Write the rules somewhere everyone can find them (a README is fine). Keep it simple:
- Which endpoints are âmust not duplicateâ and require idempotency handling?
- Which screens must work offline, and which are read-only when offline?
- Whatâs the maximum wait time per action (login, feed refresh, upload, checkout)?
- What can be cached on-device, and whatâs the expiry time?
- After failure, do you show an error, queue for later, or require manual retry?
Once these rules are clear, your timeout values, caching headers, retry policy, and UI states are much easier to implement and test.
Timeouts that match real user expectations
Slow networks fail in different ways. A good timeout setup doesnât just âpick a number.â It matches what the user is trying to do and fails fast enough that the app can recover.
The three timeouts, in plain terms:
- Connect timeout: how long you wait to establish a connection to the server (DNS lookup, TCP, TLS). If this fails, the request never really started.
- Write timeout: how long you wait while sending the request body (uploads, large JSON, slow uplink).
- Read timeout: how long you wait for the server to send data back after the request is sent. This often shows up on spotty mobile networks.
Timeouts should reflect the screen and the stakes. A feed can be slower without real harm. A critical action should either complete or fail clearly so the user can decide what to do next.
A practical starting point (adjust after measuring):
- List loading (low risk): connect 5-10s, read 20-30s, write 10-15s.
- Search-as-you-type: connect 3-5s, read 5-10s, write 5-10s.
- Critical actions (high risk, like âPayâ or âSubmit orderâ): connect 5-10s, read 30-60s, write 15-30s.
Consistency matters more than perfection. If the user taps âSubmitâ and sees a spinner for two minutes, theyâll tap again.
Avoid âinfinite loadingâ by adding a clear upper bound in the UI, too. Show progress immediately, allow cancel, and after (say) 20-30 seconds show âStill tryingâŠâ with options to retry or check connection. That keeps the experience honest even if the network library is still waiting.
When a timeout happens, log enough to debug patterns later, without logging secrets. Useful fields include the URL path (not full query), HTTP method, status (if any), a timing breakdown (connect vs write vs read if available), network type (Wi-Fi, cellular, airplane mode), approximate request/response size, and a request ID so you can match client logs with server logs.
A simple, consistent Kotlin networking setup
When connections are slow, small inconsistencies in client setup turn into big problems. A clean baseline helps you debug faster and gives every request the same rules.
One client, one policy
Start with a single place where you build your HTTP client (often one OkHttpClient used by Retrofit). Put the basics there so every request behaves the same: default headers (app version, locale, auth token) and a clear User-Agent, timeouts set once (not sprinkled across calls), logging you can enable for debugging, and one retry policy decision (even if itâs âno automatic retriesâ).
Here is a small example that keeps configuration in one file:
val okHttp = OkHttpClient.Builder()
.connectTimeout(10, TimeUnit.SECONDS)
.readTimeout(20, TimeUnit.SECONDS)
.writeTimeout(20, TimeUnit.SECONDS)
.callTimeout(30, TimeUnit.SECONDS)
.addInterceptor { chain ->
val request = chain.request().newBuilder()
.header("User-Agent", "MyApp/${BuildConfig.VERSION_NAME}")
.header("Accept", "application/json")
.build()
chain.proceed(request)
}
.build()
val retrofit = Retrofit.Builder()
.baseUrl(BASE_URL)
.client(okHttp)
.addConverterFactory(MoshiConverterFactory.create())
.build()
Central error handling that maps to user messages
Network errors arenât just âan exception.â If each screen handles them differently, users get random messages.
Create one mapper that converts failures into a small set of user-friendly outcomes: no connection/airplane mode, timeout, server error (5xx), validation or auth error (4xx), and an unknown fallback.
This keeps UI copy consistent (âNo connectionâ vs âTry againâ) without leaking technical details.
Tag and cancel requests when screens close
On flaky networks, calls can finish late and update a screen thatâs already gone. Make cancellation a standard rule: when a screen closes, its work stops.
With Retrofit and Kotlin coroutines, canceling the coroutine scope (for example in a ViewModel) cancels the underlying HTTP call. For non-coroutine calls, keep a reference to the Call and call cancel(). You can also tag requests and cancel groups of calls when a feature is exited.
Background work shouldnât depend on the UI
Anything important that must complete (sending a report, syncing a queue, finishing a submission) should run in a scheduler designed for it. On Android, WorkManager is the usual choice because it can retry later and survive app restarts. Keep UI actions lightweight, and hand off longer work to background jobs when it makes sense.
Caching rules that are safe on mobile
Caching can be a big win on slow connections because it cuts repeat downloads and makes screens feel instant. It can also be a problem if it shows stale data at the wrong time, like an old account balance or an outdated delivery address.
A safe approach is to cache only what a user can tolerate being slightly old, and force fresh checks for anything that affects money, security, or a final decision.
Cache-Control basics you can rely on
Most rules come down to a few headers:
max-age=60: you can reuse the cached response for 60 seconds without asking the server.no-store: donât save this response at all (best for tokens and sensitive screens).must-revalidate: if itâs expired, you must check with the server before using it again.
On mobile, must-revalidate prevents âquietly wrongâ data after a temporary offline period. If the user opens the app after a subway ride, you want a fast screen, but you also want the app to confirm whatâs still true.
ETag refreshes: fast, cheap, and reliable
For read endpoints, ETag-based validation is often better than long max-age values. The server sends an ETag with the response. Next time, the app sends If-None-Match with that value. If nothing changed, the server replies 304 Not Modified, which is tiny and fast on weak networks.
This works well for product lists, profile details, and settings screens.
A simple rule of thumb:
- Cache âreadâ endpoints with short
max-ageplusmust-revalidate, and supportETagwhere you can. - Donât cache âwriteâ endpoints (POST/PUT/PATCH/DELETE). Treat them as always network-bound.
- Use
no-storefor anything sensitive (auth responses, payment steps, private messages). - Cache static assets (icons, public config) longer, because the risk of staleness is low.
Keep caching decisions consistent across the app. Users notice mismatches more than small delays.
Safe retries without making things worse
Retries feel like an easy fix, but they can backfire. Retry the wrong requests and you create extra load, drain battery, and make the app feel stuck.
Start by retrying only failures that are likely temporary. A dropped connection, a read timeout, or a short server outage can succeed on the next try. A bad password, a missing field, or a 404 wonât.
A practical rule set:
- Retry timeouts and connection failures.
- Retry 502, 503, and sometimes 504.
- Donât retry 4xx (except 408 or 429, if you have a clear wait rule).
- Donât retry requests that already reached the server and might be processing.
- Keep retries low (often 1 to 3 attempts).
Backoff + jitter: fewer retry storms
If many users hit the same outage, instant retries can create a wave of traffic that slows recovery. Use exponential backoff (wait longer each time) and add jitter (a small random delay) so devices donât retry in sync.
For example: wait about 0.5 seconds, then 1 second, then 2 seconds, with a random +/- 20% each time.
Put a cap on total retry time
Without limits, retries can trap users in a spinner for minutes. Pick a maximum total time for the whole operation, including all waits. Many apps aim for 10 to 20 seconds before they stop and show a clear option to try again.
Also match the context. If someone is submitting a form, they want an answer quickly. If a background sync fails, you can retry later.
Never auto-retry non-idempotent actions (like placing an order or sending a payment) unless you have protection such as an idempotency key or a server-side duplicate check. If you canât guarantee safety, fail clearly and let the user decide what to do next.
Duplicate-prevention for critical actions
On a slow or flaky connection, users tap twice. The OS may retry in the background. Your app may resend after a timeout. If the action is âcreate somethingâ (place an order, send money, change a password), duplicates can hurt.
Idempotency means the same request should produce the same outcome. If the request is repeated, the server shouldnât create a second order. It should return the first result again or say âalready done.â
Use an idempotency key for each critical attempt
For critical actions, generate a unique idempotency key when the user starts the attempt, and send it with the request (often as a header like Idempotency-Key, or a field in the body).
A practical flow:
- Create a UUID idempotency key when the user taps âPayâ.
- Save it locally with a tiny record: status = pending, createdAt, request payload hash.
- Send the request with the key.
- When you get a success response, mark status = done and store the serverâs result ID.
- If you need to retry, reuse the same key, not a new one.
That âreuse the same keyâ rule is what stops accidental double charges.
Handle app restarts and offline gaps
If the app is killed mid-request, the next launch still needs to be safe. Store the idempotency key and request state in local storage (for example, a small database row). On restart, either retry with the same key or call a âcheck statusâ endpoint using the saved key or server result ID.
On the server side, the contract should be clear: when it receives a duplicate key, it should reject the second attempt or return the original response (same order ID, same receipt). If the server canât do that yet, client-side duplicate-prevention will never be fully reliable, because the app canât see what happened after it sent the request.
A good user-facing touch: if an attempt is pending, show âPayment in progressâ and disable the button until you get a final result.
UI patterns that reduce accidental resubmits
Slow connections donât just break requests. They change how people tap. When the screen freezes for two seconds, many users assume nothing happened and hit the button again. Your UI has to make âone tapâ feel reliable even when the network isnât.
Optimistic UI is safest when the action is reversible or low risk, like starring an item, saving a draft, or marking a message as read. Confirmed UI is better for money, inventory, irreversible deletes, and anything that could create duplicates.
A good default for critical actions is a clear pending state. After the first tap, immediately switch the primary button into a âSubmittingâŠâ state, disable it, and show a short line that explains whatâs happening.
Patterns that work well on flaky networks:
- Disable the primary action after tap and keep it disabled until you have a final result.
- Show a visible âPendingâ status with details (amount, recipient, item count).
- Add a simple âRecent activityâ view so users can confirm what they already sent.
- If the app is backgrounded, keep the pending state when they return.
- Prefer one clear primary button over multiple tap targets on the same screen.
Sometimes the request succeeds but the response is lost. Treat this as a normal outcome, not an error that invites repeated taps. Instead of âFailed, try again,â show âWeâre not sure yetâ and offer a safe next step like âCheck status.â If you canât check status, keep the pending record locally and tell the user youâll update when the connection returns.
Make âTry againâ explicit and safe. Only show it when you can repeat the request using the same client-side request ID or idempotency key.
Realistic example: a flaky checkout submission
A customer is on a train with spotty signal. They add items to the cart and tap Pay. The app has to be patient, but it also must not create two orders.
A safe sequence looks like this:
- The app creates a client-side attempt ID and sends the checkout request with an idempotency key (for example, a UUID stored with the cart).
- The request waits for a clear connect timeout, then a longer read timeout. The train goes into a tunnel, and the call times out.
- The app retries once, but only after a short delay and only if it never received a server response.
- The server receives the second request and sees the same idempotency key, so it returns the original result instead of creating a new order.
- The app shows a final confirmation screen when it gets the success response, even if it came from the retry.
Caching follows strict rules. Product lists, delivery options, and tax tables can be cached for a short time (GET requests). The checkout submission (POST) is never cached. Even if you use an HTTP cache, treat it as read-only help for browsing, not something that can ârememberâ a payment.
Duplicate-prevention is a mix of network and UI choices. When the user taps Pay, the button is disabled and the screen shows âSubmitting order...â with a single Cancel option. If the app loses network, it switches to âStill tryingâ and keeps the same attempt ID. If the user force-closes and reopens, the app can resume by checking order status using that ID, instead of asking them to pay again.
Quick checklist and next steps
If your app feels âmostly fineâ on office Wi-Fi but falls apart on trains, elevators, or rural areas, treat this as a release gate. This work is less about clever code and more about clear rules you can repeat.
Checklist before you ship:
- Set timeouts per endpoint type (login, feed, upload, checkout) and test on throttled and high-latency networks.
- Retry only where itâs truly safe, and cap it with backoff (a couple of tries for reads, usually none for writes).
- Add an idempotency key for every critical write (payments, orders, form submissions) so a retry or double tap canât create duplicates.
- Make caching rules explicit: what can be served stale, what must be fresh, and what should never be cached.
- Make states visible: pending, failed, and completed should look different, and the app should remember completed actions after a restart.
If one of these is âweâll decide later,â youâll end up with random behavior across screens.
Next steps to make it stick
Write a one-page networking policy: endpoint categories, timeout targets, retry rules, and caching expectations. Enforce it in one place (interceptors, a shared client factory, or a small wrapper) so every team member gets the same behavior by default.
Then do a short duplicate drill. Pick one critical action (like checkout), simulate a frozen spinner, force-close the app, toggle airplane mode, and press the button again. If you canât prove itâs safe, users will eventually find a way to break it.
If you want to implement the same rules across backend and clients without hand-wiring everything, AppMaster (appmaster.io) can help by generating production-ready backend and native mobile source code. Even then, the key is the policy: define idempotency, retries, caching, and UI states once, and apply them consistently across the whole flow.
FAQ
Start by defining what âcorrectâ looks like for each screen and action, especially anything that must happen at most once like payments or orders. Once the rules are clear, set timeouts, retries, caching, and UI states to match those rules instead of relying on library defaults.
Users usually see endless spinners, errors after a long wait, actions that work on the second try, or duplicate results like two orders or double charges. These are often caused by unclear retry and âpending vs failedâ rules, not just bad signal.
Use connect timeout for how long youâll wait to establish a connection, write timeout for sending the request body (uploads), and read timeout for waiting on the response after sending. A reasonable default is shorter timeouts for low-risk reads and longer read/write timeouts for critical submissions, with a clear UI limit so users arenât stuck waiting forever.
Yes, if you only set one, use callTimeout to cap the whole operation end-to-end so you avoid âinfiniteâ waiting. Then layer connect/read/write timeouts as needed for better control, especially for uploads and slow response bodies.
Start by retrying only temporary failures like connection drops, DNS issues, and timeouts, and sometimes 502/503/504 responses. Avoid retrying 4xx errors and avoid auto-retrying writes unless you have idempotency protection, because retries can create duplicates.
Use a small number of retries (often 1â3) with exponential backoff and a bit of random jitter so many devices donât retry at the same time. Also cap the total time spent retrying so the user gets a clear outcome instead of a spinner that lasts minutes.
Idempotency means repeating the same request wonât create a second result, so a double tap or retry wonât double-charge or double-book. For critical actions, send an idempotency key per attempt and reuse it for retries so the server can return the original result instead of creating a new one.
Generate a unique key when the user starts the action, store it locally with a small âpendingâ record, and send it with the request. If you retry or the app restarts, reuse the same key and either retry safely or check status, so you never turn one user intent into two server writes.
Cache only data that can safely be a bit old, and force fresh checks for money, security, and final decisions. For reads, prefer short freshness plus revalidation and consider ETags; for writes, donât cache at all, and use no-store for sensitive responses.
Disable the primary button after the first tap, show an immediate âSubmittingâŠâ state, and keep a visible pending status that survives backgrounding or restarts. If the response might be lost, donât push users into repeated taps; instead show uncertainty (âWeâre not sure yetâ) and offer a safe next step like checking status.


