All courses > Design and Creative Arts > Video Editing ::

Timelines, Tracks, and Sync: Building a Clean Editing Foundation

Capítulo 3

Estimated reading time: 12 minutes

How a Timeline Thinks: Time, Not Files

A timeline is a time-based container where clips become decisions: what plays, when it plays, and what overlaps. Your media can be any mix of formats, but the timeline has a single set of rules (frame rate, resolution, audio sample rate). Those rules affect playback smoothness, sync reliability, and what your export can be without conversions.

Think of the timeline as a ruler. Video clips are blocks placed on that ruler; audio clips are waveforms placed on their own lanes. When you trim, ripple, roll, or slip, you are changing relationships in time. Clean timelines reduce accidental desync, wrong exports, and “mystery” glitches that come from mismatched settings or messy track structure.

1) Timeline Settings That Matter

Frame rate (fps)

What it controls: how many frames exist per second on the timeline. This is the backbone of motion cadence and timing.

Match source when: you have a single-camera shoot with one consistent frame rate (e.g., all 23.976 or all 25), and you want the most faithful motion without frame interpolation.
Set deliverable when: you must deliver a specific standard (broadcast spec, client requirement, platform requirement) and you’re willing to convert any off-speed sources.
Mixed frame rates: choose the deliverable frame rate early, then be consistent. Mixing 24/30/60 in one timeline is possible, but it increases the chance of stutter, duplicated frames, or motion artifacts depending on how the NLE conforms footage.

Practical rule: If your project is mostly one frame rate, build the timeline at that frame rate unless delivery forces a different one. If delivery forces a different one, test a short segment early to confirm motion looks acceptable.

Resolution (frame size)

What it controls: the pixel dimensions of the timeline (e.g., 1920×1080, 3840×2160). This affects scaling, sharpness, and how much reframing room you have.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Match source when: you want minimal scaling and the source is already your intended output size.
Set deliverable when: you know the final format (e.g., 1080p) even if you shot higher (e.g., 4K). Editing in a 1080p timeline with 4K footage can be efficient and gives reframing headroom.
Vertical vs horizontal: decide early. A 9:16 deliverable should usually start as a 9:16 timeline so titles, safe areas, and framing decisions are made correctly.

Audio sample rate (Hz)

What it controls: how many audio samples exist per second (commonly 48,000 Hz for video). Sample rate mismatches can cause drift over long durations, especially when external audio is involved.

Typical video standard: 48 kHz.
Common mismatch: music or consumer recorders at 44.1 kHz mixed with 48 kHz video timelines. Most editors convert automatically, but drift or tiny sync errors can appear in long takes if something was recorded at an unusual rate or with unstable clocks.

Practical rule: Keep the timeline at 48 kHz for video work. If you receive 44.1 kHz audio, convert it to 48 kHz before or during editing (your NLE may do this on import or render).

When to match source vs set deliverable: a quick decision table

Scenario	Best starting point	Why
Single camera, consistent fps, client flexible	Match source fps + match/near source resolution	Least conversion, most natural motion
Multiple cameras with different fps	Set deliverable fps	Forces consistent timing and export spec
Shot 4K, delivering 1080p	1080p timeline (deliverable)	Faster editing + reframing headroom
Social vertical deliverable	Vertical timeline (deliverable)	Framing and graphics decisions are format-specific
External audio recorder involved	48 kHz timeline	Reduces drift risk and matches video standard

2) Tracks and Layers: Building an Organized Stack

Video tracks vs audio tracks

Video tracks stack visually: higher tracks typically appear “on top” of lower tracks. If two clips overlap in time, the upper track usually covers the lower one (unless transparency/masks are involved).

Audio tracks mix together: overlapping audio generally plays simultaneously. Track order matters less for “visibility,” but matters a lot for organization, routing, and avoiding accidental edits.

Why track order matters

Predictable layering: keep primary picture on lower video tracks and overlays (graphics, b-roll overlays, adjustment layers) above. This makes it obvious what is “base” vs “on top.”
Fewer mistakes: if dialogue is always on A1–A2, you won’t accidentally cut music when you meant to cut speech.
Faster troubleshooting: when something is too loud or missing, you can isolate the correct track quickly.

A practical track layout template

Use a consistent template so every project feels familiar. Example:

V3: Titles / Graphics / Overlays
V2: B-roll
V1: A-roll (primary talking head)
A1: Dialogue (lav or boom)
A2: Dialogue (camera scratch or second mic)
A3–A4: SFX / Foley
A5–A6: Music

If you work with dual-system audio often, reserve A1 for the “best mic” and A2 for backup/scratch. Consistency helps you make quick decisions under time pressure.

Track naming

Name tracks based on function, not file names. Good names describe what should live there:

V1: A-ROLL
V2: B-ROLL
V3: GFX
A1: DIALOGUE MAIN
A2: DIALOGUE SCRATCH
A3: SFX
A5: MUSIC

When you return weeks later, functional names prevent re-learning your own timeline.

Track targeting (what gets affected by edits)

Most editors let you choose which tracks are “targeted” for operations like insert/overwrite edits, paste, ripple deletes, or keyboard trims. If the wrong tracks are targeted, you can unintentionally shift music, cut SFX, or paste clips onto the wrong layer.

Practical habit: before any big operation (paste, ripple delete, insert), glance at which tracks are targeted. Target only what you intend to change.

Lock, mute, and solo

Lock: prevents changes. Use it to protect finished sections (e.g., locked music bed) while you refine dialogue edits above/below.
Mute: silences a track. Use it to quickly A/B check “with music vs without music” or to focus on dialogue cleanup.
Solo: plays only that track (or that track group). Use it to find pops, hum, or unwanted room noise in dialogue without distraction.

Workflow tip: lock tracks you are not actively editing. Many timeline accidents happen during fast trimming when a track you forgot about is still editable.

3) Syncing Methods: Getting Picture and Sound to Agree

Sync means aligning audio and video so that events happen at the same moment (mouth movements match speech, claps land on the impact frame). The best method depends on what information you have: waveforms, timecode, a slate, or just your eyes and ears.

Waveform sync (audio matching)

What it uses: the shape of the audio waveform from camera audio (scratch) and external audio (recorder/mic). The editor compares patterns and aligns them.

Works best when: the camera recorded any usable scratch audio and the environment isn’t extremely noisy.
Common failure cases: very quiet scratch audio, heavy wind noise, or long takes with little distinct sound.

Timecode sync

What it uses: matching timecode metadata recorded by camera and audio recorder (often from a shared timecode source).

Works best when: production used timecode properly and devices were jam-synced or continuously synced.
Benefits: fast, reliable, and scalable for multi-camera shoots.
Watch out for: mismatched frame rate timecode settings (e.g., 23.976 vs 24, drop-frame vs non-drop-frame in 29.97 workflows), which can cause offsets or drift.

Slate/clap sync

What it uses: a visible clap (hands or clapperboard) and the corresponding sharp audio spike.

Why it’s reliable: it gives you a clear visual frame (the moment of contact) and a clear audio transient (the spike).
Best practice: clap once, clearly, in frame, near the start of the take.

Manual alignment (eyes + ears)

What it uses: any identifiable event (a clap, a word with a strong consonant like “p” or “t,” a door slam) to line up audio and video by hand.

Manual sync checklist:

Find a sharp sound event in the audio waveform (a spike).
Find the matching visual moment (clap contact, mouth closure/opening on a plosive).
Align the external audio spike to the camera audio spike (or to the visual frame if camera audio is unusable).
Play back and watch lips. If it feels “off,” nudge by 1–2 frames and re-check.

Important: If sync starts correct but drifts later, you may have a sample rate mismatch, a variable frame rate issue, or a recorder/camera clock mismatch. Drift is different from a constant offset.

4) Multi-Clip and Multi-Cam Concepts (Software-Agnostic)

Multi-clip (grouped sync)

A multi-clip is a grouped set of media (often one video clip plus one or more audio clips) that behaves like a single clip in the timeline. The goal is to preserve sync while you edit.

Use it when: you have dual-system audio for many takes and want each take to stay linked.
Benefit: you can trim and move the “clip” without accidentally leaving audio behind.

Multi-cam (switching angles)

Multi-cam editing is designed for multiple cameras (and often a master audio source) recorded at the same time. The system keeps all angles in sync and lets you switch between them while maintaining continuous time.

Use it when: interviews with two cameras, live events, podcasts with multiple angles.
Core idea: one synchronized “bundle” containing multiple video angles and one or more audio sources; your edit becomes a sequence of angle choices.

Key concept: Whether multi-clip or multi-cam, the foundation is the same: establish reliable sync first, then edit in a way that protects that sync.

5) Practical Exercise: Sync External Audio to a Talking-Head Clip and Build a Labeled Timeline

Goal

You will sync a talking-head video clip (with camera scratch audio) to an external audio recording (lav/boom/recorder), then build a clean timeline with labeled tracks for dialogue, music, and SFX.

What you need

One talking-head video clip that includes scratch audio from the camera mic.
One external audio file recorded for the same take.
Optional: a slate/clap at the start (recommended).

Step A — Create a timeline with correct settings

Decide deliverable specs: choose the frame rate and resolution you intend to export (or match the source if that’s the plan). Set audio sample rate to 48 kHz for video work.
Create your track template: add and name tracks before you start cutting. Example:
- V1: A-ROLL
- V2: B-ROLL
- V3: GFX
- A1: DIALOGUE MAIN
- A2: DIALOGUE SCRATCH
- A3: SFX
- A5: MUSIC
Set targeting intentionally: target V1 and A1/A2 for the sync step. Keep MUSIC and SFX tracks untargeted for now to avoid accidental edits.

Step B — Place the talking-head clip and external audio

Put the talking-head video on V1. Its camera scratch audio should land on A2 (DIALOGUE SCRATCH).
Put the external audio on A1 (DIALOGUE MAIN), aligned roughly near the start of the video clip (exact alignment comes next).

Step C — Sync using the best available method

Method 1: Waveform sync (if available)

Zoom in enough to see waveform detail on A1 and A2.
Find a distinctive waveform feature near the start (ideally the clap spike).
Slide the external audio (A1) until its waveform spike matches the camera scratch spike (A2).
Play back a few seconds and watch lip sync. If needed, nudge the external audio by 1 frame (or smaller increments if your editor supports subframe audio nudging) and re-check.

Method 2: Slate/clap sync (highly reliable)

Find the video frame where hands/clapper make contact.
On the external audio waveform, find the sharp transient spike from the clap.
Align the spike to the clap contact frame (or align spike-to-spike if you also have the scratch spike).
Verify with lip sync on a spoken line a few seconds later.

Method 3: Manual alignment without a clap

Find a word with a strong consonant (like “p,” “b,” “t,” “k”) and locate the moment the lips close/open.
Find the corresponding transient in the audio waveform.
Align and then verify across multiple points (start, middle, later). If it drifts, investigate sample rate or variable frame rate issues.

Step D — Confirm and protect sync

Mute A2 (scratch) and listen to A1 (external). Then briefly unmute A2 to compare. External audio should sound cleaner and fuller.
Check for drift: scrub to 30–60 seconds later and verify lip sync again. If it’s off later but correct at the start, you likely have drift (not just an offset).
Link/group the synced items: group the V1 clip with the A1 external audio so trims and moves keep them together. Keep A2 scratch linked only if you want it as a backup; otherwise, disable or detach it to reduce clutter.
Lock what’s stable: once you trust the sync, consider locking A1 temporarily while you do rough picture trims, then unlock when you need detailed audio edits.

Step E — Build a labeled foundation for dialogue, music, and SFX

Dialogue: keep the external mic on A1 DIALOGUE MAIN. Keep scratch on A2 but muted (or delete it after you’re confident).
Music: place your music bed on A5 MUSIC. Keep it separate from dialogue so you can adjust levels and make cuts without touching speech.
SFX: place any sound effects (whooshes, hits, room tone fills, transitions) on A3 SFX.
Track hygiene: avoid placing music or SFX on dialogue tracks “just for now.” That habit creates confusion later when you need quick fixes.
Use mute/solo for checks: solo A1 to listen for dialogue issues; solo A5 to check music edits; mute A5 to confirm speech clarity.

Quick self-check rubric

Settings: timeline fps and resolution match your plan (source or deliverable), audio sample rate is 48 kHz.
Tracks: dialogue, music, and SFX are on dedicated, named tracks.
Sync: lips match speech at the start and later in the clip (no drift).
Protection: synced items are grouped/linked; non-active tracks are locked or untargeted during major edits.

Now answer the exercise about the content:

When syncing external audio to a talking-head clip, sync looks correct at the start but becomes noticeably off 30–60 seconds later. What is the most likely cause?

You are right! Congratulations, now go to the next page

You missed! Try again.

If sync is correct at the start but gets worse later, it indicates drift. Common causes include sample-rate mismatch, variable frame rate, or recorder/camera clock mismatch, not just a one-time alignment error.