Schema Design - Practical Rules of Thumb
Good schema design keeps data consistent, queryable, and maintainable. This article gives practical rules: primary keys, normalization vs denormalization, naming, indexes, and when to split or merge tables. It does not replace a full data-modeling course but gives a checklist you can apply in backend systems.
Overview
- Primary key: Every table should have a stable, unique primary key (single column or composite). Prefer surrogate keys (e.g. auto-increment or UUID) when natural keys are bulky or change; use natural keys when they are small and immutable.
- Normalization: Normalize to reduce redundancy and update anomalies (e.g. 3NF). Denormalize only when you have a proven read-hot path and accept write complexity (e.g. redundant counters, copied columns).
- Naming: Use consistent, descriptive names (snake_case common in SQL). Table names plural or singular by convention; column names clear (e.g.
created_at,user_id). Avoid reserved words and abbreviations unless standard (e.g.id,url). - Indexes: Add indexes for WHERE, JOIN, ORDER BY, and GROUP BY columns; avoid over-indexing writes. Prefer composite indexes for common query patterns; keep key length reasonable.
Example
Example 1: Surrogate PK and foreign key
SQLCREATE TABLE users ( id BIGINT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255) NOT NULL UNIQUE, created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE orders ( id BIGINT PRIMARY KEY AUTO_INCREMENT, user_id BIGINT NOT NULL, total DECIMAL(10,2), FOREIGN KEY (user_id) REFERENCES users(id), KEY idx_user (user_id) );
- Surrogate
idfororderskeeps joins and indexes small.user_idis a clear foreign key with an index for lookups.
Example 2: When to denormalize
SQL-- Normalized: count requires aggregation SELECT user_id, COUNT(*) FROM orders GROUP BY user_id; -- Denormalized: read one row (write must maintain count) ALTER TABLE users ADD COLUMN order_count INT DEFAULT 0; -- On insert/delete order: UPDATE users SET order_count = order_count + 1 WHERE id = ?;
- Denormalize only when the read pattern is hot and you can maintain the redundant value correctly on every write (and handle failures).
Example 3: Naming and types
- Tables:
orders,order_items,user_profiles(consistent plural or singular). - Columns:
created_at,updated_at,user_id,status(clear and consistent). - Avoid:
tbl_orders,data,col1; use appropriate types (e.g. DECIMAL for money, DATETIME for time, INT/BIGINT for counts/IDs).
Core Mechanism / Behavior
- Normalization: Splits data so each fact is stored once; updates are in one place. More joins on read.
- Denormalization: Duplicates or pre-aggregates data for faster reads; every write path must keep the duplicate in sync. Use for hot reads and measurable gain.
- Indexes: Speed up filters and joins; each index slows down inserts/updates and uses space. Balance by profiling actual queries.
| Rule | What to do |
|---|---|
| PK | One stable unique key per table; prefer surrogate if natural key is large or mutable |
| FKs | Declare FKs for integrity; index FK columns for JOINs and cascades |
| Names | Consistent snake_case; table/column names self-explanatory |
| Index | Index columns used in WHERE/JOIN/ORDER BY; avoid redundant or unused indexes |
| Denorm | Only when read load justifies it and you can maintain consistency on write |
Key Rules
- Don’t skip the primary key; it drives the clustered index in InnoDB and is the natural join key. Prefer a single, stable column (e.g.
id) when possible. - Normalize first; denormalize only after measuring and for specific hot paths. Document and centralize logic that maintains denormalized data.
- Use consistent naming and appropriate types (no string for numbers or dates); set NOT NULL and defaults where the business rule is clear.
What's Next
See Index Fundamentals for B-Tree and covering indexes. See Slow Query Diagnosis for tuning after schema is in place. See Pagination for design that works well with list queries.