Duration of the online course: 33 hours and 54 minutes
New
Free course on SQL and database internals: indexing, query optimization, concurrency control, recovery, and distributed databases.
In this free course, learn about
Course Foundations and the Relational Model
Storage Engines and Data Layout
Indexing Data Structures and Index Concurrency
Core Query Processing: Sorting, Joins, and Execution
Query Optimization and Planning
Transaction Concurrency Control
Logging and Recovery
Distributed Databases and Modern Data Warehouses
Course Description
Database Systems: SQL, Indexing, Query Optimization, Transactions and Distributed Databases is a free online course in the Technology and Programming category, focused on modern database fundamentals and real-world system design. It guides you from the relational model and practical SQL to the internal mechanics that make databases fast, reliable, and scalable.
You will explore how data is stored and accessed, including file layouts, log-structured merge trees, and the tradeoffs between row and column storage with compression considerations. The course also covers memory and core data structures such as hash tables, then moves into indexing techniques like B-trees as well as vector and inverted indexes, filters, and tries, including how indexes behave under concurrent workloads.
On the query side, it digs into sorting, join algorithms, and query execution, then explains how optimizers plan efficient strategies. You will learn the theory and practice of concurrency control, including two-phase locking, optimistic methods, and multi-version concurrency control, with a strong focus on correctness and performance.
Reliability topics include logging and recovery, including ARIES-style recovery principles. Finally, the course expands to distributed databases, covering architectures, distributed transactional systems, distributed analytics, and how modern platforms and warehouses fit into the broader landscape. Ideal for learners aiming to strengthen SQL skills and understand what happens beneath the surface of database engines.
Course content
Video class: #00 - Course Overview21m
Video class: #01 - Relational Model1h23m
Exercise: What is a key benefit of the relational model’s physical data independence?
Video class: #02 - Modern SQL (CMU Intro to Database Systems)1h18m
Exercise: In SQL, why is a GROUP BY clause required when selecting a non-aggregated column alongside an aggregate (e.g., course_id with AVG(gpa))?
Video class: #03 - Database Storage: Files1h22m
Exercise: In a disk-oriented database system, why is random access on nonvolatile storage typically slower than sequential access?
Video class: #04 - Database Storage: Log-Structured Merge Trees1h22m
Exercise: What is the main purpose of using a slotted page (slot array) in a tuple-oriented storage layout?
Video class: #05 - Row vs. Column Storage Compression ? StarTree Database Talk (CMU Intro to Database Systems)1h28m
Video class: #06 - Memory1h23m
Exercise: Why do many database systems avoid using OS memory mapping (mmap) as the primary buffer management mechanism?
Video class: #07 - Hash Tables ? RelationalAI Database Talk (CMU Intro to Database Systems)1h23m
Exercise: In linear probing hash tables, what is the standard way to handle deletions without breaking future lookups?
Video class: #08 - Tree Indexes: B Trees (CMU Intro to Database Systems)1h22m
Exercise: Which statement best describes a key advantage of a B+ tree (vs. a B-tree) for range scans?
Video class: #09 - Vector Indexes, Inverted Indexes, Filters, Tries ? TiDB Talk (CMU Intro to Database Systems)1h21m
Exercise: Which statement best describes a Bloom filter compared to an index like a B+ tree?
Video class: #10 - Index Concurrency Control (CMU Intro to Database Systems)1h16m
Exercise: In database index concurrency control, what is the primary purpose of a latch (as opposed to a lock)?
Video class: #11 - Sorting1h17m
Exercise: In external merge sort with B buffer pages available, how many pages are typically reserved for input runs during a merge pass?
Video class: #12 - Join Algorithms: Hash, Sort-Merge, Nested Loop Joins (CMU Intro to Database Systems)1h14m
Exercise: Which join algorithm is typically preferred for large datasets when the join predicate is an equality match and the goal is to minimize disk I/O?
Video class: #13 - Query Execution Part 1 (CMU Intro to Database Systems)1h23m
Exercise: In the iterator (Volcano) processing model, how does an operator signal that it has no more output tuples?
Video class: #14 - Query Execution Part 2 ? ClickHouse Database Talk (CMU Intro to Database Systems)1h02m
Exercise: In parallel query execution, what is the main purpose of an exchange operator (e.g., gather)?
Video class: #15 - Query Planning1h21m
Exercise: Which optimization change most directly reduces unnecessary work by applying a filter before a join?
Video class: #16 - Concurrency Control Theory ? Firebolt Database Talk (CMU Intro to Database Systems)1h27m
Exercise: Which condition indicates that a schedule is conflict-serializable using a dependency (precedence) graph?
Video class: #17 - Two-Phase Locking Concurrency Control (CMU Intro to Database Systems)1h05m
Exercise: What rule defines the transition between the growing and shrinking phases in two-phase locking (2PL)?
Video class: #18 - Optimistic Concurrency Control ? Weaviate Database Talk (CMU Intro to Database Systems)1h20m
Exercise: In Optimistic Concurrency Control (OCC), when is a transaction typically assigned its timestamp in the described protocol?
Video class: #19 - Multi-Version Concurrency Control (CMU Intro to Database Systems)1h27m
Exercise: In MVCC, what ensures that a transaction reads a consistent snapshot of the database as of its start time?
Video class: #20 - Database Logging ? Confluent Database Talk (CMU Intro to Database Systems)1h19m
Exercise: In write-ahead logging (WAL), what must be guaranteed before acknowledging that a transaction has committed?
Video class: #21 - Database Recovery with ARIES (CMU Intro to Database Systems)1h12m
Exercise: In ARIES recovery, what is the purpose of a pageLSN stored on each page?
Video class: #22 - Distributed Database Architectures ? DataStax Database Talk (CMU Intro to Database Systems)1h23m
Exercise: In a shared-disk distributed database architecture, what is a key advantage compared to shared-nothing when adding more compute nodes?
Video class: #23 - Distributed Transactional Databases (CMU Intro to Database Systems)1h24m
Exercise: In a primary-replica replication setup, which statement best describes how writes and reads are handled?
Video class: #24 - Distributed Analytical Databases (CMU Intro to Database Systems)1h22m
Video class: #25 - BigQuery Snowflake Redshift Databricks DuckDB (CMU Intro to Database Systems)1h25m
Exercise: Why can a branchless (predicate) scan outperform a branching scan on modern CPUs when predicate selectivity is around 50%?
This free course includes:
33 hours and 54 minutes of online video course
Digital certificate of course completion (Free)
Exercises to train your knowledge
100% free, from content to certificate
Ready to get started?Download the app and get started today.