Apr 20, 2025·8 min read

Incremental data sync with checkpoints: align systems safely

Incremental data sync with checkpoints helps you keep systems aligned using cursors, hashes, and resume tokens so you can resume safely without reimporting.

Why reimporting everything keeps causing problems

Full reimports feel safe because they look simple: delete, reload, done. In practice, they are one of the easiest ways to create slow syncs, higher bills, and messy data.

The first issue is time and cost. Pulling the entire dataset every run means you re-download the same records again and again. If you are syncing 500,000 customers nightly, you pay for compute, API calls, and database writes even when only 200 records changed.

The second issue is correctness. Full reimports often create duplicates (because matching rules are imperfect), or they overwrite newer edits with older data that happened to be in the export. Many teams also see totals drift over time because “delete and reload” silently fails halfway through.

Typical symptoms look like this:

Counts do not match between systems after a run
Records appear twice with small differences (email casing, phone formatting)
Recently updated fields flip back to an older value
The sync sometimes “finishes” but misses a chunk of data
Support tickets spike after every import window

A checkpoint is just a small saved marker that says, “I processed up to here.” Next time, you continue from that marker instead of starting over. The marker could be a timestamp, a record ID, a version number, or a token returned by an API.

If your real goal is to keep two systems aligned over time, incremental data sync with checkpoints is usually the better target. It is especially useful when data changes frequently, exports are large, APIs have rate limits, or you need the sync to resume safely after a crash (for example, when a job fails midway in an internal tool you built on a platform like AppMaster).

Define the sync goal before you choose a method

Incremental data sync with checkpoints only works well when you are clear about what “correct” looks like. If you skip this and jump straight to cursors or hashes, you usually end up rebuilding the sync later because the rules were never written down.

Start by naming the systems and deciding who owns the truth. For example, your CRM might be the source of truth for customer names and phone numbers, while your billing tool is the source of truth for subscription status. If both systems can edit the same field, you do not have one source of truth, and you must plan for conflicts.

Next, define what “aligned” means. Do you need an exact match at all times, or is it fine if updates show up within a few minutes? Exact match often implies stricter ordering, stronger guarantees around checkpoints, and more careful handling of deletes. Eventual consistency is usually cheaper and more tolerant of temporary failures.

Decide the direction of syncing. One-way sync is simpler: System A feeds System B. Two-way sync is harder because every update can be a conflict, and you must avoid endless loops where each side keeps “fixing” the other.

Questions to answer before building

Write down simple rules that everyone agrees on:

Which system is the source of truth for each field (or each object type)?
What lag is acceptable (seconds, minutes, hours)?
Is this one-way or two-way, and which events flow in each direction?
How are deletes handled (hard delete, soft delete, tombstones)?
What happens when both sides changed the same record?

A practical conflict rule set can be as simple as “billing wins for subscription fields, CRM wins for contact fields, otherwise newest update wins.” If you are building the integration in a tool like AppMaster, capture these rules in your Business Process logic so they stay visible and testable, not buried in someone’s memory.

Cursors, hashes, and resume tokens: the building blocks

Incremental data sync with checkpoints usually relies on one of three “positions” you can safely store and reuse. The right choice depends on what the source system can guarantee, and what failures you need to survive.

A cursor checkpoint is the simplest. You store “the last thing I processed” such as a last ID, a last updated_at timestamp, or a sequence number. On the next run, you request records after that point. This works well when the source sorts consistently and IDs or timestamps move forward in a reliable way. It breaks when updates arrive late, clocks differ, or records can be inserted “in the past” (for example, backfilled data).

Hashes help you detect change when a cursor alone is not enough. You can hash each record (based on the fields you care about) and sync only when the hash changes. Or you can hash a whole batch to quickly spot drift and then zoom in. Per record hashes are accurate but add storage and compute. Batch hashes are cheaper but can hide which item changed.

Resume tokens are opaque values the source issues, often for pagination or event streams. You do not interpret them, you just store them and pass them back to continue. Tokens are great when the API is complex, but they can expire, become invalid after retention windows, or behave differently across environments.

What to use, and what can go wrong

Cursor: fast and simple, but watch for out-of-order updates.
Per record hash: precise change detection, but higher cost.
Batch hash: cheap drift signal, but not very specific.
Resume token: safest pagination, but may expire or be one-time use.
Hybrid (cursor + hash): common when “updated_at” is not fully trustworthy.

If you are building a sync in a tool like AppMaster, these checkpoints usually live in a small “sync state” table, so every run can resume without guessing.

Designing your checkpoint storage

Checkpoint storage is the small piece that makes incremental data sync with checkpoints reliable. If it is hard to read, easy to overwrite, or not tied to a specific job, your sync will look fine until it fails once, then you are guessing.

First, pick where checkpoints live. A database table is usually the safest because it supports transactions, auditing, and simple queries. A key-value store can work if you already use one and it supports atomic updates. A config file is only reasonable for single-user, low-risk syncs, because it is hard to lock and easy to lose.

What to store (and why)

A checkpoint is more than a cursor. Save enough context to debug, resume, and detect drift:

Job identity: job name, tenant or account id, object type (for example, customers)
Progress: cursor value or resume token, plus a cursor type (time, id, token)
Health signals: last run time, status, records read and written
Safety: last successful cursor (not just last attempted), and a short error message for the latest failure

If you use change detection hashes, store the hash method version too. Otherwise you can change the hash later and accidentally treat everything as “changed.”

Versioning and many sync jobs

When your data model changes, version your checkpoints. The easiest approach is to add a schema_version field and create new rows for a new version, instead of mutating old data. Keep old rows for a while so you can roll back.

For multiple sync jobs, namespace everything. A good key is (tenant_id, integration_id, object_name, job_version). That avoids the classic bug where two jobs share one cursor and quietly skip data.

Concrete example: if you build the sync as an internal tool in AppMaster, store checkpoints in PostgreSQL with one row per tenant and object, and update it only after a successful batch commit.

Step-by-step: implement an incremental sync loop

Avoid long-term technical debt

Generate real source code and keep ownership when your integration grows.

Export Code

An incremental data sync with checkpoints works best when your loop is boring and predictable. The goal is simple: read changes in a stable order, write them safely, then move the checkpoint forward only when you know the write is done.

A simple loop you can trust

First, pick an ordering that never changes for the same record. Timestamps can work, but only if you also include a tie-breaker (like an ID) so two updates at the same time do not shuffle.

Then run the loop like this:

Decide your cursor (for example: last_updated + id) and page size.
Fetch the next page of records newer than the stored checkpoint.
Upsert each record into the target (create if missing, update if present) and capture failures.
Commit the successful writes, then persist the new checkpoint from the last processed record.
Repeat. If the page is empty, sleep, then try again.

Keep the checkpoint update separate from the fetch. If you save the checkpoint too early, a crash can silently skip data.

Backoff and retries without duplicates

Assume calls will fail. When a fetch or write fails, retry with a short backoff (for example: 1s, 2s, 5s) and a max retry count. Make retries safe by using upserts and by making your writes idempotent (same input, same result).

A small, practical example: if you are syncing customer updates every minute, you might fetch 200 changes at a time, upsert them, and only then store the last customer’s (updated_at, id) as your new cursor.

If you build this in AppMaster, you can model the checkpoint in a simple table (Data Designer) and run the loop in a Business Process that fetches, upserts, and updates the checkpoint in one controlled flow.

Make resumes safe: idempotency and atomic checkpoints

If your sync can resume, it will resume at the worst possible time: after a timeout, a crash, or a partial deploy. The goal is simple: rerunning the same batch should not create duplicates or lose updates.

Idempotency is the safety net. You get it by writing in a way that can be repeated without changing the final result. In practice that usually means upserts, not inserts: write the record using a stable key (like customer_id), and update existing rows when they already exist.

A good “write key” is something you can trust across retries. Common options are a natural ID from the source system, or a synthetic key you store the first time you see the record. Back it with a unique constraint so the database enforces your rule even when two workers race.

Atomic checkpoints matter just as much. If you advance the checkpoint before the data is committed, a crash can make you skip records forever. Treat the checkpoint update as part of the same unit of work as your writes.

Here’s a simple pattern for incremental data sync with checkpoints:

Read changes since the last checkpoint (cursor or token).
Upsert each record using a deduplication key.
Commit the transaction.
Only then persist the new checkpoint.

Out-of-order updates and late-arriving data are the other common trap. A record might be updated at 10:01 but arrive after one from 10:02, or an API might deliver older changes on retry. Protect yourself by storing a source “last_modified” and applying a “last write wins” rule: only overwrite when the incoming record is newer than what you already have.

If you need stronger protection, keep a small overlap window (for example, re-read the last few minutes of changes) and rely on idempotent upserts to ignore repeats. This adds a little extra work, but it makes resumes boring, which is exactly what you want.

In AppMaster, the same idea maps cleanly to a Business Process flow: do the upsert logic first, commit, then store the cursor or resume token as the final step.

Common mistakes that break incremental syncing

Deploy where your data lives

Deploy to AppMaster Cloud or your own AWS, Azure, or Google Cloud environment.

Deploy App

Most sync bugs are not about code. They come from a few assumptions that feel safe until real data shows up. If you want incremental data sync with checkpoints to stay reliable, watch for these traps early.

The usual failure points

A common mistake is trusting updated_at too much. Some systems rewrite timestamps during backfills, timezone fixes, bulk edits, or even read-repairs. If your cursor is just a timestamp, you can miss records (timestamp jumps backward) or reprocess huge ranges (timestamp jumps forward).

Another trap is assuming IDs are continuous or strictly increasing. Imports, sharding, UUIDs, and deleted rows break that idea. If you use “last seen ID” as a checkpoint, gaps and out-of-order writes can leave records behind.

The most damaging bug is advancing the checkpoint on partial success. For example, you fetch 1,000 records, write 700, then crash, but still store the “next cursor” from the fetch. On resume, the remaining 300 never get retried.

Deletes are also easy to ignore. A source might soft-delete (flag), hard-delete (row removed), or “unpublish” (status change). If you only upsert active records, the target slowly drifts.

Finally, schema changes can invalidate old hashes. If your change detection hashes were built from a set of fields, adding or renaming a field can make “no change” look like “changed” (or the opposite) unless you version your hash logic.

Here are safer defaults:

Prefer a monotonic cursor (event ID, log position) over raw timestamps when possible.
Treat checkpoint writes as part of the same success boundary as your data writes.
Track deletes explicitly (tombstones, status transitions, or periodic reconcile).
Version your hash inputs and keep old versions readable.
Add a small overlap window (re-read last N items) if the source can reorder updates.

If you build this in AppMaster, model the checkpoint as its own table in the Data Designer and keep the “write data + write checkpoint” step together in a single business process run, so retries do not skip work.

Monitoring and drift detection without getting noisy

Create a sync state table

Model your sync state in PostgreSQL using AppMaster Data Designer.

Start Building

Good monitoring for incremental data sync with checkpoints is less about “more logs” and more about a few numbers you can trust on every run. If you can answer “what did we process, how long did it take, and where will we resume?”, you can debug most issues in minutes.

Start by writing one compact run record each time the sync executes. Keep it consistent so you can compare runs and spot trends.

Start cursor (or resume token) and end cursor
Records fetched, records written, records skipped
Run duration and average time per record (or per page)
Error count with the top error reason
Checkpoint write status (success/failure)

Drift detection is the next layer: it tells you when systems are “both working” but slowly diverging. Totals alone can be misleading, so combine a lightweight total check with small spot checks. For example, once per day compare total active customers in both systems, then sample 20 random customer IDs and confirm a few fields match (status, updated_at, email). If totals differ but samples match, you may be missing deletes or filters. If samples differ, your change detection hashes or field mapping is likely wrong.

Alerts should be rare and actionable. A simple rule: only alert when a human must act now.

Cursor stuck (end cursor does not move for N runs)
Error rate rising (for example, 1% -> 5% over an hour)
Runs getting slower (duration above your normal ceiling)
Backlog growing (new changes arrive faster than you sync)
Drift confirmed (totals mismatch for two checks in a row)

After a failure, re-run without manual cleanup by replaying safely. The easiest approach is to resume from the last committed checkpoint, not the last “seen” record. If you use a small overlap window (re-read the last page), make writes idempotent: upsert by stable ID, and only advance the checkpoint after the write succeeds. In AppMaster, teams often implement these checks in a Business Process flow and send alerts via email/SMS or Telegram modules so failures are visible without constant dashboard watching.

Quick checklist before you ship the sync

Before you turn on an incremental data sync with checkpoints in production, do a quick pass on the few details that usually cause late surprises. These checks take minutes, but they prevent days of “why did we miss records?” debugging.

Here’s a practical pre-ship checklist:

Make sure the field you use for ordering (timestamp, sequence, ID) is truly stable and has an index on the source side. If it can change after the fact, your cursor will drift.
Confirm your upsert key is guaranteed unique, and that both systems treat it the same way (case sensitivity, trimming, formatting). If one system stores "ABC" and the other stores "abc", you will get duplicates.
Store checkpoints separately for each job and each dataset. A “global last cursor” sounds simple, but it breaks as soon as you sync two tables, two tenants, or two filters.
If the source is eventually consistent, add a small overlap window. For example, when resuming from “last_updated = 10:00:00”, restart from 09:59:30 and rely on idempotent upserts to ignore repeats.
Plan a light reconciliation: on a schedule, pick a small sample set (like 100 random records) and compare key fields to catch quiet drift.

A quick reality test: pause the sync mid-run, restart it, and verify you end up with the same results. If restarting changes counts or creates extra rows, fix that before launch.

If you build the sync in a tool like AppMaster, keep each integration flow’s checkpoint data tied to the specific process and dataset, not shared across unrelated automations.

Example: syncing customer records between two apps

Add monitoring that stays useful

Create an internal dashboard to monitor cursor movement, errors, and drift signals.

Build Tool

Picture a simple setup: your CRM is the source of truth for contacts, and you want the same people to exist in a support tool (so tickets map to real customers) or in a customer portal (so users can log in and see their account).

On the first run, do a one-time import. Pull contacts in a stable order, for example by updated_at plus id as a tiebreaker. After you write each batch into the destination, save a checkpoint like: last_updated_at and last_id. That checkpoint is your starting line for every future run.

For ongoing runs, fetch only records newer than the checkpoint. Updates are straightforward: if the CRM contact already exists, update the destination record; if not, create it. Merges are the tricky part. CRMs often merge duplicates and keep one “winning” contact. Treat that as an update that also “retires” the losing contact by marking it inactive (or mapping it to the winner) so you do not end up with two portal users for the same person.

Deletions rarely show up in normal “updated since” queries, so plan for them. Common options are a soft-delete flag in the source, a separate “deleted contacts” feed, or a periodic lightweight reconciliation that checks for missing IDs.

Now the failure case: the sync crashes halfway through. If you only store a checkpoint at the end, you will reprocess a huge chunk. Instead, use a resume token per batch.

Start a run and generate a run_id (your resume token)
Process a batch, write destination changes, then atomically save the checkpoint tied to run_id
On restart, detect the last saved checkpoint for that run_id and continue from there

Success looks boring: counts stay stable day to day, runtimes are predictable, and re-running the same window produces zero unexpected changes.

Next steps: choose a pattern and build it with less rework

Once your first incremental loop works, the fastest way to avoid rework is to write down the rules of the sync. Keep it short: what records are in scope, what fields win on conflicts, and what “done” looks like after each run.

Start small. Pick one dataset (like customers) and run it end to end: initial import, incremental updates, deletes, and a resume after an intentional failure. It is easier to fix assumptions now than after you add five more tables.

A full rebuild is still sometimes the right call. Do it when the checkpoint state is corrupted, when you change identifiers, or when a schema change breaks your change detection (for example, you used a hash and the meaning of fields changed). If you rebuild, treat it as a controlled operation, not an emergency button.

Here’s a safe way to do a reimport without downtime:

Import into a shadow table or a parallel dataset, leaving the current one live.
Validate counts and spot-check samples, including edge cases (nulls, merged records).
Backfill relationships, then switch readers to the new dataset in one planned cutover.
Keep the old dataset for a short rollback window, then clean up.

If you want to build this without writing code, AppMaster can help you keep the pieces in one place: model the data in PostgreSQL with the Data Designer, define the sync rules in the Business Process Editor, and run scheduled jobs that pull, transform, and upsert records. Because AppMaster regenerates clean code when requirements change, it also makes “we need to add one more field” less risky.

Before you expand to more datasets, document your sync contract, pick one pattern (cursor, resume token, or hash), and get one sync fully reliable. Then repeat the same structure for the next dataset. If you want to try it quickly, create an application in AppMaster and run a small scheduled sync job first.