Audio Basics for Video Editing: Dialogue, Music, and Clean Mixes

Capítulo 7

Estimated reading time: 10 minutes

+ Exercise

1) Audio priorities: intelligible dialogue first

In most videos, viewers will forgive imperfect visuals faster than they’ll tolerate hard-to-understand speech. A practical rule: dialogue is the “hero” track, music supports it, and everything else is optional. Your workflow should reflect that priority.

  • Start with dialogue only: mute music and effects while you clean and level speech.
  • Decide what “clean” means for your project: a quiet interview needs low noise and steady loudness; a street vlog can keep some ambience as long as words stay clear.
  • Keep one consistent reference: pick a representative sentence and use it to judge changes (EQ, compression, noise reduction) so you don’t chase your tail.

Practical mindset: if you must choose between “natural” and “understandable,” choose understandable—then add back a little natural ambience (room tone) so it doesn’t feel sterile.

2) Levels and metering: headroom, clipping, and what to aim for

What meters are telling you

Most editors show some combination of:

  • Peak level: the loudest instantaneous moments (plosives, laughs, bumps). Peaks are what clip first.
  • Average loudness (often RMS or LUFS in more advanced meters): closer to how loud something feels over time.

If you only watch peaks, you may end up with dialogue that never clips but still feels too quiet. If you only chase loudness, you may push peaks into distortion. You need both concepts, even with basic meters.

Clipping and headroom in plain language

Clipping happens when audio exceeds the system’s maximum level (0 dBFS in digital). The tops of the waveform get “chopped,” creating harsh distortion that is hard to fix. Headroom is the safety space below that ceiling so unexpected peaks don’t clip.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Practical targets (simple and safe)

  • Dialogue peaks: aim roughly around -6 dB to -3 dB on loud words/laughs (gives headroom).
  • Dialogue average: keep it consistent; if your software shows LUFS, a common target for online video is around -16 LUFS integrated (stereo). If you don’t have LUFS, focus on “steady meter movement” and consistent perceived loudness.
  • Music under dialogue: often sits 10–20 dB lower than speech while someone is talking (exact amount depends on arrangement and voice).

Gain vs. volume (why it matters)

Clip gain (or item gain) changes the level of the audio clip itself before effects. Track fader/volume changes the level after clip-level adjustments (and often after effects). A practical approach:

  • Use clip gain to even out inconsistent lines within a dialogue clip.
  • Use the track fader to set the overall dialogue track level against music.

3) Basic cleanup steps: room tone, bumps, crossfades, simple noise reduction

Step-by-step cleanup workflow (no advanced tools required)

  1. Listen for problems first: wear headphones and mark issues (bumps, chair squeaks, mic rub, breath pops, long silences).
  2. Trim obvious junk: remove handling noise at the start/end of takes and between sentences if it’s distracting.
  3. Replace dead silence with room tone: instead of cutting to absolute silence (which sounds unnatural), fill gaps with consistent background from the same recording environment.
  4. Add short crossfades on every dialogue cut: this prevents clicks and makes edits feel seamless.
  5. Control peaks (manual first): reduce clip gain on single loud words, laughs, or plosives before reaching for heavy processing.

Room tone: how to use it

Room tone is the natural background sound of the space (air conditioner, distant traffic, subtle mic hiss). It helps edits feel continuous.

  • Find a clean sample: locate 5–15 seconds where nobody speaks.
  • Loop or duplicate it to cover gaps between lines.
  • Keep it subtle: room tone should be felt, not noticed. If it becomes obvious, lower it slightly or choose a better sample.

Removing bumps and thumps

Short bumps often live in the low frequencies and can spike peaks. Basic fixes:

  • Cut or reduce the region: if it’s between words, remove it and fill with room tone.
  • Fade in/out around it: if you can’t remove it, use tiny fades to soften the transient.
  • High-pass filter (if available): gently roll off very low rumble (common starting point: 70–100 Hz for many voices, adjusted by ear).

Crossfades: the “invisible edit” for audio

Even if a cut sounds fine on speakers, it can click on headphones. A safe default is a very short crossfade (2–10 ms) on dialogue edits. Use longer crossfades (20–100 ms) when blending room tone or smoothing between different takes.

Simple noise reduction principles (and the common trap)

Noise reduction can help constant background hiss or hum, but overuse creates watery, swirly artifacts that draw attention. Principles:

  • Reduce, don’t erase: aim for “less noticeable,” not “perfectly silent.”
  • Apply only where needed: if only one clip is noisy, don’t process everything.
  • Do it early, then stop: once noise is acceptable, move on to EQ/leveling; stacking multiple noise tools often makes it worse.

4) EQ and compression in plain language (problems they solve, typical misuse)

EQ: shaping tone so speech reads clearly

EQ (equalization) changes the balance of frequencies. For dialogue, the goal is usually clarity and consistency, not a “radio voice.”

Problem you hearWhat it often meansSimple EQ move
Rumble / mic handling / HVACToo much low-end energyHigh-pass filter gently (start ~70–100 Hz, adjust by voice)
Muddy / boxy speechToo much low-mid buildupSmall cut in low-mids (often ~200–500 Hz), narrow to moderate width
Hard to understand consonantsNot enough presenceSmall boost in presence region (often ~2–5 kHz), subtle
Harsh / piercing “S” and “T”Too much high presenceSmall cut around ~5–8 kHz or use a de-esser if available

Typical EQ misuse

  • Over-boosting highs: makes speech sound crisp at first, then fatiguing and hissy.
  • Huge scoops/boosts: extreme EQ can make edits between clips obvious because tone changes too much.
  • EQing without level-matching: louder often sounds “better.” Compare before/after at similar loudness.

Compression: controlling dynamics so words stay present

Compression reduces the difference between loud and quiet parts. For dialogue, it helps keep soft words audible without letting loud words jump out.

Plain-language controls you may see:

  • Threshold: level where compression starts. Lower threshold = more compression.
  • Ratio: how strongly it compresses once above threshold (e.g., 3:1 is moderate).
  • Attack: how quickly it reacts. Too fast can dull consonants; too slow can miss peaks.
  • Release: how quickly it stops compressing. Too fast can “pump”; too slow can feel squashed.
  • Makeup gain: raises the output after compression so overall loudness returns.

Simple dialogue compression starting point

  • Ratio: 2:1 to 4:1
  • Attack: 10–30 ms (lets some natural transients through)
  • Release: 60–150 ms (smooth recovery)
  • Adjust threshold until loud phrases compress a few dB (watch gain reduction if shown).

Typical compression misuse

  • Too much compression: brings up room noise and makes breathing/ mouth sounds distracting.
  • Using compression to fix bad levels: first even out with clip gain; then compress gently.
  • Not controlling peaks: compression is not always a limiter. If peaks still jump, reduce them manually or use a limiter if available.

5) Music editing: clean loops, ducking under speech, matching phrases to beats

Looping music cleanly

A “clean loop” means the listener can’t tell where the music repeats. Practical steps:

  1. Find a repeating section: look for a bar/phrase that clearly cycles (often 4 or 8 bars).
  2. Cut on musical boundaries: align cuts on downbeats (the “1” of a measure) or clear chord changes.
  3. Use short crossfades: 20–100 ms often hides tiny discontinuities.
  4. Check with eyes and ears: zoom into the waveform near the cut; if there’s a sudden jump, adjust the cut point or fade shape.

If the loop still “clicks,” try cutting at a zero crossing (where the waveform crosses the center line) and re-apply a short crossfade.

Ducking music under dialogue (manual and simple)

Ducking means lowering music when someone speaks and raising it back between lines. You can do this without special plugins using keyframes/automation.

  1. Set a base music level: where it feels good during pauses (not competing with speech).
  2. Add keyframes around each spoken section: one just before speech starts, one shortly after.
  3. Lower the music during speech: typically by 10–20 dB depending on how busy the track is.
  4. Use gentle ramps: 200–500 ms fades often sound natural; faster if you need a punchy style.

Matching music phrases to beats (so edits feel intentional)

Even basic edits feel professional when music changes happen on beat or at the end of a phrase.

  • Cut on the downbeat: place section transitions (new topic, new scene) on a strong beat.
  • Respect phrase length: many tracks resolve every 4, 8, or 16 bars. Try to end a section at a musical “landing.”
  • Use “button” endings: if the track has a clear ending hit, align it with your final moment of the segment.

When you must cut mid-phrase, hide it with a whoosh/transition sound (if appropriate) or a slightly longer crossfade, but avoid making music edits more noticeable than the video edit.

6) Exercise: mix dialogue with background music for consistent clarity

Goal

Create a 30–60 second sequence with one dialogue clip and one music track where speech remains consistently clear, music transitions are smooth, and there are no clicks, bumps, or distracting level jumps.

What you need

  • One dialogue clip (ideally with a few sentences and at least one pause)
  • One music track (any genre, preferably with a steady beat)

Step-by-step

  1. Prepare the dialogue track
    • Mute music.
    • Trim obvious start/end noise.
    • Add short crossfades on dialogue edits (2–10 ms).
    • Fill awkward gaps with room tone instead of silence (duplicate a clean room tone sample).
  2. Set dialogue level
    • Adjust clip gain so sentences feel even (reduce loud words, lift quiet ones slightly).
    • Watch peaks and keep the loudest moments roughly around -6 to -3 dB.
  3. Apply gentle processing (optional but recommended)
    • EQ: add a high-pass filter to reduce rumble; make small clarity adjustments only if needed.
    • Compression: apply light compression (2:1–4:1) to keep speech steady; avoid pumping.
    • If noise is constant and distracting, apply mild noise reduction—stop as soon as it’s acceptable.
  4. Add music and build a clean bed
    • Place music under the whole section.
    • Loop it cleanly if needed (cut on beats, add 20–100 ms crossfades).
    • Set a base level where music sounds good during pauses.
  5. Ducking for clarity
    • Add keyframes around each spoken region.
    • Lower music during speech by 10–20 dB (adjust until every word is easy to understand).
    • Use smooth ramps (about 200–500 ms) into and out of ducked sections.
  6. Transitions between sections
    • If your dialogue has a topic shift, align a music change (loop point, new phrase, or downbeat) with that moment.
    • Fade music slightly earlier than you think if the first word of a new section needs extra clarity.
  7. Quality check pass
    • Listen on headphones: confirm no clicks at cuts, no sudden tone changes, no harsh “S” sounds, no pumping.
    • Listen on small speakers/phone: confirm dialogue is still intelligible.
    • Make final small adjustments with clip gain and music keyframes (avoid big last-minute EQ boosts).

Self-check rubric

  • Intelligibility: can you understand every sentence without effort?
  • Consistency: do any words suddenly jump louder or drop too quiet?
  • Clean edits: are there clicks, pops, or abrupt ambience changes?
  • Music support: does music feel present but never competitive under speech?
  • Transitions: do music changes land on beats/phrases and feel smooth?

Now answer the exercise about the content:

When mixing dialogue with background music, what workflow best follows standard audio priorities for clarity?

You are right! Congratulations, now go to the next page

You missed! Try again.

Dialogue is the “hero” track. A practical workflow is to mute music while you clean and level speech, then add music and use ducking (keyframes) so music sits under dialogue and every word stays clear.

Next chapter

Transitions and Continuity: When to Cut, When to Transition

Arrow Right Icon
Free Ebook cover Video Editing Fundamentals: From Raw Footage to Finished Cut
64%

Video Editing Fundamentals: From Raw Footage to Finished Cut

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.