1) Primary keys and candidate keys
A key is a column (or set of columns) that uniquely identifies a row. Good key design makes rows easy to reference, prevents duplicates, and keeps relationships stable over time.
Primary key (PK)
A primary key is the chosen unique identifier for a table. SQL Server enforces it with a PRIMARY KEY constraint, which guarantees uniqueness and disallows NULLs. A table can have only one primary key, but it can be made of one or multiple columns.
Candidate keys
A candidate key is any column (or set of columns) that could serve as the primary key because it is unique and non-null. You pick one candidate key as the primary key; the others are typically enforced with UNIQUE constraints.
Example: In a Users table, both Email and UserId might be unique. If you choose UserId as the primary key, Email can remain a candidate key enforced with a UNIQUE constraint.
Designing stable identifiers
- Prefer stable, meaningless identifiers for the primary key (surrogate keys), such as an integer identity, when natural values can change (email, phone, product code revisions).
- Still enforce business uniqueness with UNIQUE constraints (for example, unique email) so the database prevents duplicates even if the PK is a surrogate.
- Avoid “smart” keys that embed meaning (dates, region codes) if that meaning can change or if it encourages updates to key values.
2) Implementing IDENTITY columns (seed, increment) and how values are generated
An IDENTITY column is a numeric column where SQL Server automatically generates values when you insert rows. It is commonly used as a surrogate primary key.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Seed and increment
You define an identity as IDENTITY(seed, increment):
- Seed: the first generated value.
- Increment: how much to add for each new row.
Common pattern: IDENTITY(1,1) generates 1, 2, 3, 4, ...
Hands-on: create a table with an IDENTITY primary key
-- Example table for hands-on experiments (run in your own database context) CREATE TABLE dbo.Customer ( CustomerId int IDENTITY(1,1) NOT NULL, FullName nvarchar(200) NOT NULL, Email nvarchar(320) NULL, CONSTRAINT PK_Customer PRIMARY KEY (CustomerId), CONSTRAINT UQ_Customer_Email UNIQUE (Email) );Notes:
CustomerIdis generated automatically and is the primary key.Emailis enforced as unique (a candidate key) but is not the primary key.
Insert rows and observe generated values
INSERT INTO dbo.Customer (FullName, Email) VALUES ('Ava Chen', 'ava@example.com'); INSERT INTO dbo.Customer (FullName, Email) VALUES ('Noah Patel', 'noah@example.com'); INSERT INTO dbo.Customer (FullName, Email) VALUES ('Mia Rivera', 'mia@example.com'); SELECT CustomerId, FullName, Email FROM dbo.Customer ORDER BY CustomerId;You should see CustomerId values 1, 2, 3 (assuming the table was empty and seed was 1).
How SQL Server generates identity values
- When you insert a row without specifying the identity column, SQL Server assigns the next identity value.
- The identity value is generated as part of the insert operation, but it is not guaranteed to be “gap-free” (more on that below).
- Identity generation is per table, not per session.
3) Common behaviors: gaps, rollbacks, reseeding, and inserts that interact with identity
Gaps after deletes
Deleting rows does not “reuse” identity values. Identity is about generating new values, not maintaining a consecutive sequence.
DELETE FROM dbo.Customer WHERE CustomerId = 2; INSERT INTO dbo.Customer (FullName, Email) VALUES ('Liam Scott', 'liam@example.com'); SELECT CustomerId, FullName FROM dbo.Customer ORDER BY CustomerId;Expected behavior: the new row gets CustomerId = 4, not 2.
Gaps after rollbacks
If an insert is rolled back, the identity value that was generated is typically not reused. This means you can see gaps even if the row never “really” existed.
BEGIN TRAN; INSERT INTO dbo.Customer (FullName, Email) VALUES ('Temp Person', 'temp@example.com'); -- Check the identity value generated in this scope SELECT SCOPE_IDENTITY() AS NewCustomerId; ROLLBACK; -- Now insert a real row INSERT INTO dbo.Customer (FullName, Email) VALUES ('Sophia King', 'sophia@example.com'); SELECT CustomerId, FullName FROM dbo.Customer ORDER BY CustomerId;Expected behavior: the identity value shown inside the transaction will not be reused after the rollback; the next insert continues from the last generated value, leaving a gap.
Reseeding an identity
Sometimes you need to adjust the next identity value (for example, after loading data or cleaning a table). SQL Server provides DBCC CHECKIDENT to check and reseed.
-- Check current identity information DBCC CHECKIDENT ('dbo.Customer', NORESEED); -- Reseed so that the next insert uses 1000 (i.e., current identity becomes 999) DBCC CHECKIDENT ('dbo.Customer', RESEED, 999); INSERT INTO dbo.Customer (FullName, Email) VALUES ('Reseed Test', 'reseed@example.com'); SELECT CustomerId, FullName FROM dbo.Customer WHERE Email = 'reseed@example.com';Important: reseeding can cause collisions if rows already exist with the same values you are reseeding into. Use it carefully, especially in shared environments.
Inserting explicit values into an IDENTITY column
By default, you should not insert into an identity column. If you must (for example, during a data migration), you can temporarily enable IDENTITY_INSERT for the table.
-- Example: inserting a specific identity value (migration scenario) SET IDENTITY_INSERT dbo.Customer ON; INSERT INTO dbo.Customer (CustomerId, FullName, Email) VALUES (500, 'Imported Customer', 'imported@example.com'); SET IDENTITY_INSERT dbo.Customer OFF; SELECT CustomerId, FullName FROM dbo.Customer WHERE CustomerId = 500;Notes:
- Only one table in a session can have
IDENTITY_INSERTset to ON at a time. - After inserting explicit values, the next generated identity value may need reseeding if you inserted higher values than the current identity.
Retrieving the identity value you just inserted (safely)
Applications often need the generated key immediately after an insert. Use a method that is safe in the presence of triggers and concurrent inserts.
Preferred: OUTPUT clause
OUTPUT INSERTED.CustomerId returns the actual values inserted, and it works for single-row and multi-row inserts.
DECLARE @NewIds TABLE (CustomerId int); INSERT INTO dbo.Customer (FullName, Email) OUTPUT INSERTED.CustomerId INTO @NewIds(CustomerId) VALUES ('Olivia Park', 'olivia@example.com'); SELECT CustomerId FROM @NewIds;Common: SCOPE_IDENTITY()
SCOPE_IDENTITY() returns the last identity value generated in the current scope (for example, the current stored procedure or batch). This is typically safer than @@IDENTITY because it avoids identity values generated by triggers in a different scope.
INSERT INTO dbo.Customer (FullName, Email) VALUES ('Ethan Brooks', 'ethan@example.com'); SELECT SCOPE_IDENTITY() AS NewCustomerId;Guidance:
- Use
SCOPE_IDENTITY()for single-row inserts when you just need the last identity value from your statement scope. - Use
OUTPUTwhen inserting multiple rows or when you want a robust pattern that returns inserted values directly.
4) When to consider alternatives such as UNIQUEIDENTIFIER (with trade-offs)
An identity integer is a great default for many tables, but it is not the only option. Sometimes you need identifiers that are globally unique across systems or that can be generated outside the database.
UNIQUEIDENTIFIER (GUID) basics
UNIQUEIDENTIFIER stores a 16-byte GUID. You can generate it in SQL Server (for example with NEWID()) or in the application, which is useful for distributed systems and offline inserts.
Example: GUID primary key
CREATE TABLE dbo.OrderHeader ( OrderId uniqueidentifier NOT NULL CONSTRAINT DF_OrderHeader_OrderId DEFAULT NEWID(), OrderNumber nvarchar(30) NOT NULL, CONSTRAINT PK_OrderHeader PRIMARY KEY (OrderId), CONSTRAINT UQ_OrderHeader_OrderNumber UNIQUE (OrderNumber) ); INSERT INTO dbo.OrderHeader (OrderNumber) VALUES ('SO-10001'); SELECT OrderId, OrderNumber FROM dbo.OrderHeader;Trade-offs: IDENTITY vs UNIQUEIDENTIFIER
- Size and index impact:
intis 4 bytes;uniqueidentifieris 16 bytes. Larger keys can increase index size and affect performance. - Insert locality: identity values are increasing, which tends to be friendly to clustered indexes. Random GUIDs (from
NEWID()) can cause more page splits and fragmentation when used as a clustered key. - Distributed generation: GUIDs can be generated outside SQL Server without coordination, which is helpful for merging data from multiple sources.
- Predictability: identity values are predictable (1,2,3...). GUIDs are harder to guess, which can be a minor security advantage in some scenarios (but should not replace proper authorization).
Practical guidance
- Use IDENTITY int/bigint as a common default for internal surrogate primary keys.
- Use UNIQUEIDENTIFIER when you need global uniqueness across databases/services, or when keys must be generated client-side.
- Keep business identifiers (like order numbers) as separate columns with UNIQUE constraints, regardless of the primary key type.