Schema Design - Practical Rules of Thumb

Good schema design keeps data consistent, queryable, and maintainable. This article gives practical rules: primary keys, normalization vs denormalization, naming, indexes, and when to split or merge tables. It does not replace a full data-modeling course but gives a checklist you can apply in backend systems.

Overview

  • Primary key: Every table should have a stable, unique primary key (single column or composite). Prefer surrogate keys (e.g. auto-increment or UUID) when natural keys are bulky or change; use natural keys when they are small and immutable.
  • Normalization: Normalize to reduce redundancy and update anomalies (e.g. 3NF). Denormalize only when you have a proven read-hot path and accept write complexity (e.g. redundant counters, copied columns).
  • Naming: Use consistent, descriptive names (snake_case common in SQL). Table names plural or singular by convention; column names clear (e.g. created_at, user_id). Avoid reserved words and abbreviations unless standard (e.g. id, url).
  • Indexes: Add indexes for WHERE, JOIN, ORDER BY, and GROUP BY columns; avoid over-indexing writes. Prefer composite indexes for common query patterns; keep key length reasonable.

Example

Example 1: Surrogate PK and foreign key

SQL
CREATE TABLE users (
    id         BIGINT PRIMARY KEY AUTO_INCREMENT,
    email      VARCHAR(255) NOT NULL UNIQUE,
    created_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
    id         BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id    BIGINT NOT NULL,
    total      DECIMAL(10,2),
    FOREIGN KEY (user_id) REFERENCES users(id),
    KEY idx_user (user_id)
);
  • Surrogate id for orders keeps joins and indexes small. user_id is a clear foreign key with an index for lookups.

Example 2: When to denormalize

SQL
-- Normalized: count requires aggregation
SELECT user_id, COUNT(*) FROM orders GROUP BY user_id;

-- Denormalized: read one row (write must maintain count)
ALTER TABLE users ADD COLUMN order_count INT DEFAULT 0;
-- On insert/delete order: UPDATE users SET order_count = order_count + 1 WHERE id = ?;
  • Denormalize only when the read pattern is hot and you can maintain the redundant value correctly on every write (and handle failures).

Example 3: Naming and types

  • Tables: orders, order_items, user_profiles (consistent plural or singular).
  • Columns: created_at, updated_at, user_id, status (clear and consistent).
  • Avoid: tbl_orders, data, col1; use appropriate types (e.g. DECIMAL for money, DATETIME for time, INT/BIGINT for counts/IDs).

Core Mechanism / Behavior

  • Normalization: Splits data so each fact is stored once; updates are in one place. More joins on read.
  • Denormalization: Duplicates or pre-aggregates data for faster reads; every write path must keep the duplicate in sync. Use for hot reads and measurable gain.
  • Indexes: Speed up filters and joins; each index slows down inserts/updates and uses space. Balance by profiling actual queries.
RuleWhat to do
PKOne stable unique key per table; prefer surrogate if natural key is large or mutable
FKsDeclare FKs for integrity; index FK columns for JOINs and cascades
NamesConsistent snake_case; table/column names self-explanatory
IndexIndex columns used in WHERE/JOIN/ORDER BY; avoid redundant or unused indexes
DenormOnly when read load justifies it and you can maintain consistency on write

Key Rules

  • Don’t skip the primary key; it drives the clustered index in InnoDB and is the natural join key. Prefer a single, stable column (e.g. id) when possible.
  • Normalize first; denormalize only after measuring and for specific hot paths. Document and centralize logic that maintains denormalized data.
  • Use consistent naming and appropriate types (no string for numbers or dates); set NOT NULL and defaults where the business rule is clear.

What's Next

See Index Fundamentals for B-Tree and covering indexes. See Slow Query Diagnosis for tuning after schema is in place. See Pagination for design that works well with list queries.