Google Sheet to relational schema: a step-by-step modeling plan
Google Sheet to relational schema, explained in plain steps: spot repeating groups, choose keys, map relationships, and prevent messy data later.

Why spreadsheets turn messy when they become a database
A spreadsheet is great for a small list. You can change columns on the fly, add notes anywhere, and fix problems by eye. That freedom starts to break down when the file becomes a shared source of truth.
As data grows, the same issues show up again and again. You see duplicates because thereâs no single place to store a customer or product. You get conflicting values because two rows disagree about the same thing, like a phone number. Filtering and reporting become frustrating because some columns hide lists ("Tags", "Products", "Attendees") or mix formats ("$1,200", "1200", "1.2k").
Moving from a Google Sheet to a relational schema is about safety. A database forces clearer structure so you can query, validate, and update data without creating new contradictions.
A useful mental model: one row should represent one real thing. If a row represents a deal, a customer, and a list of products, updating any one of those later will be painful.
A quick test: does a single row ever need two values for the same field?
- One order has multiple products
- One project has multiple team members
- One customer has multiple addresses
If the answer is yes, itâs not a âwide rowâ problem. Itâs a âseparate tableâ problem. Once you model it cleanly, you can build forms and validation on top of it instead of relying on fragile manual edits.
Start by defining what the sheet actually means
A spreadsheet can look organized and still mean different things to different people. Before you convert a Google Sheet to a relational schema, agree on what the sheet is tracking.
Start with outcomes, not columns. What decisions should the data support: a weekly revenue report, a list of overdue tickets, a workflow that assigns follow-ups, or quick lookup during a customer call? If you canât name a decision, that field often doesnât belong in the database.
Next, pull out the nouns hiding in the headers and notes. These usually become your future tables: customers, orders, products, invoices, tickets, agents, locations. If a column mixes two nouns (like âCustomer + Companyâ), youâre storing multiple things in one place.
Agree on definitions early
Small differences in meaning turn into big cleanups later. Get clear on basics:
- What counts as an âorderâ (a quote, a paid purchase, or both)?
- What is a âcustomerâ (person, company, or either)?
- Can one order have multiple products?
- Can one email belong to multiple customers?
- What is âstatusâ meant to show (current state or history)?
Example: if your sheet has one row per âOrderâ but the âProductsâ cell contains a comma-separated list, decide whether that row represents a checkout, a shipment, or an invoice. Each choice leads to a different schema.
Freeze a copy of the original sheet as read-only. Youâll use it to validate that the new tables still answer the same questions.
Clean the sheet so the structure is visible
Before converting a Google Sheet to a relational schema, make the sheet look like data, not a report. Databases need consistent rows and columns. Decorative layout hides patterns you need to model.
Remove layout tricks like merged cells, multiple header rows, and subtotals inside the data range. Keep one header row and then rows of records only. If you need totals, put them on a separate summary tab so they donât get mixed into the real records.
Then make formats consistent across each column. A database canât guess that â1/2/24â, â2024-02-01â, and âFeb 1â are the same date. The same applies to phone numbers, currency, and names. Pick one format and use it everywhere, even if it feels strict.
A short cleanup pass that usually pays off:
- Make sure each row represents one thing (one order, one customer, one ticket).
- Remove blank spacer rows and columns.
- Replace âN/Aâ, â-â, and empty strings with one rule youâll keep.
- Mark which columns are calculated vs typed by a person.
Finally, flag any cell that contains multiple values, like âred, blue, greenâ in one column. Donât fix the schema yet. Just mark those columns so you remember theyâll become separate rows later.
Identify repeating groups and fields that hide lists
The biggest warning sign in spreadsheet data modeling is repetition. Sheets often squeeze âmore than one thingâ into a single row by repeating columns or packing multiple values into one cell. That works for quick tracking, then breaks when you need filtering, reporting, or consistent updates.
Patterns that usually mean âthis should be another tableâ
Scan for these shapes:
- Numbered columns like
Item 1,Item 2,Item 3orPhone 1,Phone 2. - Repeated blocks like address fields duplicated for âHomeâ and âWorkâ.
- Cells with commas, line breaks, or âandâ that combine values (for example, âMouse, Keyboard, Monitorâ).
- One column that mixes two concepts, like âApproved 2025-01-10â or âAlex (Manager)â.
- A row that represents two levels at once, like an Order row that also tries to store all Order Items.
Example: if your sales tracker uses Order ID, Customer, Product 1, Qty 1, Product 2, Qty 2, youâll hit a wall. Some orders have 1 item, some have 8. The sheet either grows sideways forever or starts losing data. In a relational model, âOrdersâ becomes one table, and âOrder Itemsâ becomes another table with one row per product on the order.
For âlists in a cellâ, treat each value as its own record. A cell that says âEmail, SMSâ usually means you need a separate table (or a join table) to track channels cleanly.
Mixed columns are quieter but just as risky. Split them early so each field stores one clear fact.
Create tables from the entities you found
Once you can name the real-world things in the sheet, turn each one into its own table. Your spreadsheet stops being one big grid and becomes a set of smaller, purposeful lists.
If a row mixes details about two different things, it probably needs two tables. A sales tracker row might include customer info (name, phone), order info (date, status), and product info (SKU, price). Customers donât change every time an order changes, and products donât depend on a single order. Splitting them prevents duplicate edits and mismatched values.
Before you finalize anything, write a one-sentence purpose for each table. If you canât describe what a table represents without saying âand alsoâ, itâs usually too broad.
A few practical rules:
- Keep attributes that describe the same thing and share the same lifecycle together (customer name and customer email).
- Move anything that can appear multiple times into its own table (multiple order items, multiple addresses).
- If a cell contains a list (comma-separated values, repeated columns), thatâs a separate table.
- If two sets of fields change for different reasons, separate them (order status vs customer contact info).
Then name columns clearly and consistently. Prefer simple nouns and avoid vague labels like âInfoâ or âDetailsâ.
Choose keys that stay stable over time
Pick a primary key for every table early. A good key is boring: it never changes, itâs always present, and it identifies one row and only one row.
Natural keys (real-world values) can work, but only if theyâre truly stable. An SKU is often a good natural key because itâs meant to be permanent. Email addresses sound stable, but people change emails, share inboxes, and create duplicates like âjohn@â and âjohn.work@â. Names, phone numbers, and addresses change and arenât guaranteed unique.
A safe default is an auto-generated ID (like customer_id, order_id). Keep the natural identifier as a normal field, and add a uniqueness rule when it fits your business rules. If an email changes, the customer_id stays the same and related orders still point to the right customer.
Simple key rules:
- Use an auto ID when the real-world identifier might change, be missing, or be reused.
- Use a natural key only when you control it and itâs designed to be permanent (for example, SKU).
- Mark fields as unique only when duplicates would be wrong.
- Allow NULL only when âunknownâ is a valid state; otherwise require a value.
- Write down what âuniqueâ means (unique per table, per company, or per time period).
Example: in a Contacts table, use contact_id as the primary key. Keep email unique only if your rule is one contact equals one email. Allow phone to be empty because not everyone shares it.
Map relationships without guessing
Most hard mistakes come from guessing how things relate. Use a simple rule: if one row âownsâ many of something, thatâs one-to-many. Put the foreign key on the âmanyâ side.
Example: one Customer can have many Orders. The Orders table should store customer_id. If you keep a comma-separated list of order numbers inside Customers, duplicates and missing data show up quickly.
Many-to-many is the common spreadsheet trap. If one Order can include many Products and one Product can appear in many Orders, you need a join table (often called line items). It typically includes order_id, product_id, plus fields like quantity and the price at the time of purchase.
One-to-one relationships are rare. They make sense when the extra data is optional or kept separate for privacy or performance (for example, User and UserProfile). Theyâre a warning sign when you split a table just because the sheet had two tabs.
History needs its own structure. If values can change over time (status, price, address), avoid overwriting a single column. Store changes as rows in a history table so you can answer âwhat was true on that date?â
Normalize enough to prevent contradictions
A simple rule: store one fact in one place. If a customerâs phone number appears in five rows, someone will update four of them and miss the fifth.
Normalization in plain terms:
1NF, 2NF, 3NF in practical terms
First normal form (1NF) means each cell holds one value. If a column contains âred, blue, greenâ or âSKU1|SKU2|SKU3â, thatâs a hidden list. Break it into rows in a related table.
Second normal form (2NF) shows up most in line items. If you have OrderItems and the key is (OrderID, ProductID), then fields like CustomerName donât belong there. They depend on the order, not the product.
Third normal form (3NF) means non-key fields shouldnât depend on other non-key fields. Example: if you store ZipCode and City, and City is determined by ZipCode, you risk mismatches.
A quick self-check:
- Could the same value be edited in more than one place?
- Would one change force you to update many other rows?
- Are you storing labels that can be derived from an ID?
- Are totals stored next to the raw rows that produce them?
When denormalizing is OK
Denormalize mainly for read-heavy reporting, and do it safely: treat the report table as a copy you can rebuild. Keep normalized tables as the source of truth.
For derived values like totals, balances, and status, donât duplicate them unless you have a clear rule for recalculating. A practical approach is: store raw transactions, compute totals in queries, and only cache totals when performance demands it.
Common modeling traps that create future cleanups
Most âit worked in the sheetâ problems come from meaning, not tools. The goal is to make every row say one clear thing, the same way, every time.
Common traps:
- Using names as IDs. âJohn Smithâ isnât a unique identifier, and names change. Use a generated ID (or a verified email or phone), and treat display names as labels.
- Packing lists into one cell. It looks simple, but it breaks searching, validation, and reporting. Lists belong in a related table.
- Mixing current state with history. A single Status column canât tell you both the latest status and how it changed. If timing matters, store status changes as events with timestamps.
- Overloading one table to mean multiple things. A Contacts sheet that includes customers, vendors, and employees usually ends up with fields that only apply to some rows. Split by role, or keep a shared Person table and add role-specific tables.
- Ignoring required vs optional fields. If key fields can be blank, youâll get rows that canât join cleanly. Decide whatâs required and enforce it early.
If your Orders table has columns like Item 1, Item 2, Item 3, youâre looking at a repeating group. Plan for an Orders table plus an OrderItems table.
Quick checklist before you commit to the schema
Before you lock the schema, do a final pass for clarity. Most database pain later comes from small shortcuts that felt harmless early on.
Ask whether each table answers one simple question. âCustomersâ should mean customers, not customers plus their latest order plus call notes. If you canât describe a table in one short sentence, itâs mixing things.
Final checks:
- Can you point to the column (or set of columns) that uniquely identifies each row, even if names change?
- Do any cells contain more than one value (comma-separated tags, multiple emails, Item1/Item2 columns)? If yes, split into a child table.
- For each relationship, is it stored as an intentional foreign key? For many-to-many, do you have a join table?
- Do important fields have rules (required where missing data breaks the process, unique where duplicates would be harmful)?
- Can you update a fact (customer address, product price, employee role) in exactly one place?
Reality test: imagine someone enters the same customer twice with a slightly different spelling. If your schema makes that easy, add a better key or a uniqueness rule.
Example: turning a sales tracker sheet into clean tables
Picture a sales tracker where each row is a deal. It has columns like Customer Name, Customer Email, Deal Amount, Stage, Close Date, Products (a comma-separated list), and Notes (sometimes multiple notes in one cell).
That single row hides two repeating groups: products (one deal can include many products) and notes (one deal can have many notes). This is where conversions often go wrong, because lists inside cells are hard to query and easy to contradict.
A clean âafterâ model that matches how the work actually behaves:
- Customers (CustomerId, Name, Email)
- Deals (DealId, CustomerId, Amount, Stage, CloseDate)
- Products (ProductId, Name, SKU)
- DealProducts (DealId, ProductId, Quantity, UnitPrice)
- DealNotes (NoteId, DealId, NoteText, CreatedAt)
CustomerId, DealId, and ProductId are stable identifiers. DealProducts solves the many-to-many relationship: one deal can include many products, and one product can appear in many deals. DealNotes keeps notes separate, so you donât end up with âNote 1, Note 2, Note 3â columns.
Before modeling, a report like ârevenue by productâ means splitting strings and hoping people typed names consistently. After modeling, it becomes a straightforward query over DealProducts joined to Deals and Products.
Next steps: move from schema to a working app
Once your schema looks right on paper, move it into a real database and test with real data. Donât import everything at once. Load a small batch first, fix what breaks, then repeat.
A practical order that keeps risk low:
- Create the tables and relationships.
- Import 50 to 200 rows, verify totals, and spot-check records.
- Fix mapping issues (wrong columns, missing IDs, duplicates), then re-import.
- When itâs stable, load the rest.
Add validation rules early so messy spreadsheet habits donât return. Make required fields truly required, limit allowed values (like status), validate formats (dates and emails), and use foreign keys so you canât create an order for a customer that doesnât exist.
Then stop using the sheet for updates. Protecting your data gets much easier when people have simple forms and clear workflows.
If you want to turn the schema into a working internal tool without writing code, AppMaster (appmaster.io) can help: you model tables and relationships visually, then generate a production-ready backend, web app, and native mobile apps from the same model.
FAQ
Start when the sheet is acting like a shared source of truth and youâre seeing duplicates, conflicting values, or painful reporting. If youâre fighting comma-separated lists, Item 1/Item 2 columns, or constant copy/paste fixes, a relational schema will save time fast.
If one row needs multiple values for the same field, you have a repeating group. Examples are multiple products on one order, multiple addresses for one customer, or multiple attendees for one event. Those should become child tables (or join tables), not extra columns or lists in a cell.
Freeze a read-only copy of the original sheet, then remove merged cells, multiple header rows, and subtotal rows from the data range. Make each column consistent (one date format, one currency format, one way to represent blanks) so you can see the real structure before you model it.
Use an auto-generated ID as the default for each table because itâs stable and wonât change when people update emails, names, or phone numbers. Keep real-world identifiers (like email or SKU) as normal fields and add a uniqueness rule only when duplicates are truly wrong for your business.
Map it by ownership: if one customer can have many orders, put customer_id on the Orders table. If itâs many-to-many (orders and products), add a join table like OrderItems with order_id, product_id, plus quantity and price at that time.
Because it prevents contradictions. Storing one fact in one place means you update it once and everything stays consistent. You donât need âperfect normalization,â but you do want to eliminate duplicates like the same customer phone number scattered across many rows.
Split it into proper rows. A cell like âEmail, SMSâ is hard to filter and validate, and it breaks reporting. Create a related table (or join table) where each selected value becomes its own record tied back to the parent row.
Separate âcurrent stateâ from âhistory.â Keep a current status field if you need it, but store changes as rows in a history/events table with timestamps when timing matters. This lets you answer questions like âwhat was the status last month?â without guessing.
Import a small batch first (around 50â200 rows), then reconcile totals and spot-check records against the frozen sheet. Fix mappings, missing IDs, and duplicates, then re-import. Only load everything once the process is repeatable and predictable.
A no-code tool can help when you want the schema to become a working app with forms, validation, and workflows, not just tables. With AppMaster, you can model tables and relationships visually and generate a production-ready backend, web app, and native mobile apps from the same model.


