All courses > Technology and Programming > Databases ::

Primary Keys and Identifiers: Stable Uniqueness in a Schema

Capítulo 5

Estimated reading time: 9 minutes

1) What a Primary Key Guarantees (and What It Does Not)

A primary key (PK) is the column (or set of columns) chosen to uniquely identify each row in a table. Think of it as the row’s official identifier: every other table that needs to point to that row will reference the PK.

What a primary key guarantees

Uniqueness: No two rows can have the same primary key value.
Not-null: Every row must have a primary key value; it cannot be missing.
Stable reference target: Foreign keys can safely reference it, and joins can reliably match rows.

What a primary key does not guarantee

Correctness of other attributes: A PK does not ensure that names, prices, or dates are valid; that requires other constraints (CHECK, NOT NULL, foreign keys) and application rules.
Business uniqueness beyond the chosen key: If you choose a surrogate PK, the database will still allow duplicate business identifiers unless you add a UNIQUE constraint on those business fields.
Immutability: A PK can technically be updated in many databases, but updating PKs is risky because it can cascade to many referencing rows. You should choose a PK that you can treat as effectively unchanging.

Step-by-step: sanity-check a candidate primary key

Step 1: Can it be missing? If yes, it cannot be the PK.
Step 2: Can duplicates exist (now or later)? If yes, it cannot be the PK.
Step 3: Can it change? If it can change due to business processes (renumbering, rebranding, mergers), it is a risky PK.
Step 4: Is it short and efficient? Very long keys bloat indexes and slow joins.
Step 5: Does it expose sensitive data? If the identifier is meaningful (like an email), using it as a PK can leak information and complicate privacy requirements.

Example DDL: primary key plus business uniqueness

It is common to use a surrogate PK while still enforcing business rules with UNIQUE constraints.

CREATE TABLE Customer (  customer_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,  email VARCHAR(320) NOT NULL,  full_name VARCHAR(200) NOT NULL,  CONSTRAINT uq_customer_email UNIQUE (email));

2) Natural Keys vs. Surrogate Keys: Choosing with Clear Criteria

Natural keys

A natural key is a real-world identifier that already exists in the business domain (for example, a government-issued tax ID, an ISBN for books, or a globally unique product code). The table’s PK is meaningful outside the database.

Surrogate keys

A surrogate key is an artificial identifier created solely for the database (for example, an auto-increment integer or a UUID). It has no business meaning.

Decision criteria

Use these criteria to decide between natural and surrogate keys. In practice, many designs use a surrogate PK and keep natural identifiers as UNIQUE alternate keys.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Stability: Prefer identifiers that never change. If an identifier can be corrected, reissued, or reformatted, it is unstable.
Length and performance: Short numeric keys are efficient for indexes and joins. Long strings (emails, composite codes) increase storage and can slow joins.
Privacy and exposure: If the identifier is sensitive (email, phone, national ID), using it as a PK increases the chance it appears in logs, URLs, exports, or error messages.
Change frequency: If the business identifier changes more often than “never,” treat it as a regular attribute with a UNIQUE constraint, not as the PK.
Scope of uniqueness: Some identifiers are only unique within a context (unique per store, per vendor, per year). Those are not good single-column PKs unless you include the context (which may lead to a composite key).

Practical patterns

Surrogate PK + UNIQUE natural identifier: Best default when you want stable references and still enforce business uniqueness.
Natural PK: Works well when the identifier is guaranteed stable, short, non-sensitive, and truly unique (for example, a standardized code that never changes).
UUID surrogate PK: Useful for distributed systems and offline creation, but larger than integers and can affect index locality depending on database and UUID version.

Example: product identifiers

Suppose products have a SKU that marketing sometimes changes. That is a strong hint to avoid using SKU as the PK.

CREATE TABLE Product (  product_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,  sku VARCHAR(50) NOT NULL,  name VARCHAR(200) NOT NULL,  CONSTRAINT uq_product_sku UNIQUE (sku));

Now you can change a SKU (if business allows) without breaking foreign keys, while still preventing duplicates.

3) Composite Keys: When They Simplify or Complicate the Model

A composite primary key uses multiple columns together to uniquely identify a row. Composite keys are common in tables that represent a relationship or a “line item” concept.

When composite keys simplify

Join/association tables: A table that exists to link two entities often has a natural composite key of the two foreign keys.
Uniqueness is inherently multi-part: For example, a price list entry might be unique by (price_list_id, product_id).
No need for a separate identifier: If the only purpose is to record the association and maybe a few attributes, a composite PK can be clean and self-enforcing.

Example: many-to-many association with composite PK

CREATE TABLE OrderItem (  order_id BIGINT NOT NULL,  line_number INT NOT NULL,  product_id BIGINT NOT NULL,  quantity INT NOT NULL,  PRIMARY KEY (order_id, line_number));

Here, (order_id, line_number) is a natural composite identifier for line items within an order.

When composite keys complicate

Foreign keys become wider: Any table referencing this row must carry all PK columns, increasing storage and query complexity.
Key propagation: Multi-column keys tend to spread through the schema, making joins and indexes more complex.
Harder integration: External systems often expect a single identifier; mapping multi-column identifiers can be awkward.
Updates become riskier: If any part of the composite key can change, you have a multi-column PK update problem.

Step-by-step: decide composite PK vs. surrogate PK for a line table

Step 1: Is the row naturally identified by parent + sequence? If yes, composite PK is a candidate.
Step 2: Will other tables need to reference individual lines? If yes, consider a surrogate key (order_item_id) to avoid wide foreign keys.
Step 3: Will line identifiers be stable? If line_number can change due to reordering, it is unstable; consider a surrogate key.
Step 4: Are you integrating with external systems that provide a line ID? If yes, you may want a surrogate PK plus an external ID column.

Alternative: surrogate key plus natural uniqueness

CREATE TABLE OrderItem (  order_item_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,  order_id BIGINT NOT NULL,  line_number INT NOT NULL,  product_id BIGINT NOT NULL,  quantity INT NOT NULL,  CONSTRAINT uq_orderitem_order_line UNIQUE (order_id, line_number));

This keeps a single-column PK for referencing while still enforcing “no duplicate line numbers within an order.”

4) Designing for Imports and Integration (External IDs)

Real systems rarely live alone. You may import customers from a CRM, products from an ERP, or orders from a marketplace. Those systems often have their own identifiers. Treat those as external IDs: values that identify the record in another system.

Key principle: separate internal identity from external identity

Internal PK: Your database’s stable identifier (often a surrogate key).
External ID(s): One or more columns that store identifiers from other systems, usually with a uniqueness constraint scoped by the source system.

Common patterns for external IDs

Single source integration: If you integrate with exactly one external system for an entity, store external_id directly on the table and make it UNIQUE.
Multiple sources: Use a separate mapping table keyed by (source_system, external_id) pointing to your internal PK.
Upserts: Imports often need “insert if new, update if existing.” External IDs are the matching key for upserts, not your internal PK.

Example: multiple external IDs via mapping table

CREATE TABLE Customer (  customer_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,  full_name VARCHAR(200) NOT NULL);  CREATE TABLE CustomerExternalId (  customer_id BIGINT NOT NULL,  source_system VARCHAR(50) NOT NULL,  external_id VARCHAR(100) NOT NULL,  PRIMARY KEY (source_system, external_id),  CONSTRAINT uq_customer_source UNIQUE (customer_id, source_system));

This design supports multiple external systems, prevents two customers from sharing the same (source_system, external_id), and prevents a customer from having two IDs for the same source.

Step-by-step: importing with external IDs

Step 1: Identify the source system: Store a stable code like 'crm', 'erp', 'shopify'.
Step 2: Choose the matching key: Usually (source_system, external_id).
Step 3: Enforce uniqueness: Add PRIMARY KEY or UNIQUE constraints to prevent duplicates.
Step 4: Resolve to internal PK: Look up internal customer_id/product_id using the mapping.
Step 5: Insert/update attributes: Update non-key attributes as needed, but avoid changing internal PKs.

Exercises: Pick Keys and Defend Your Choice

Exercise 1: Customer

Scenario: You are designing a Customer table for an e-commerce site. You have these candidate identifiers: email, phone number, a government-issued ID (optional), and an internal generated number.

Task A: Choose a primary key.
Task B: Decide which fields (if any) should have UNIQUE constraints.
Task C: Write 3–5 sentences defending your choice using: stability, privacy, and change frequency.

Hints to consider: emails can change; phone numbers can be recycled; government IDs are sensitive and may be missing; internal IDs are stable but not meaningful.

Exercise 2: Product

Scenario: A Product has: SKU (marketing can change it), barcode/GTIN (not always present), and a supplier’s product code (unique only per supplier). Products may be imported from multiple suppliers.

Task A: Choose a primary key.
Task B: Propose constraints for SKU, GTIN, and supplier product codes (consider scope of uniqueness).
Task C: If you support multiple suppliers, describe whether you would use a mapping table for external IDs and what its key would be.

Exercise 3: Order

Scenario: Orders have an order number shown to customers (format can change, may reset per year), and you also integrate with a payment provider that assigns a payment_reference. Orders have line items.

Task A: Choose the Order table primary key and justify it using stability and change frequency.
Task B: Decide whether order_number should be UNIQUE, and if so, whether it should be globally unique or scoped (for example, by year or store).
Task C: For line items, choose between (order_id, line_number) as a composite PK or a surrogate order_item_id. Defend your choice based on whether other tables might reference a line item and whether line numbers can change.
Task D: Decide how to store payment provider identifiers: as a column on Order or in a separate mapping table, and what uniqueness constraint you would enforce.

Now answer the exercise about the content:

You want to use a surrogate primary key for Customer but still prevent two customers from sharing the same email. Which design best enforces this rule while keeping a stable internal identifier?

You are right! Congratulations, now go to the next page

You missed! Try again.

A surrogate PK provides a stable internal identifier. A separate UNIQUE constraint on the business field (email) enforces business uniqueness that the PK alone does not guarantee.