Why “spoken writing” is different
A YouTube script isn’t meant to be read silently—it’s meant to be heard. That changes what “good writing” looks like. Spoken delivery needs: (1) sentences that land on the first listen, (2) phrasing that fits the mouth, and (3) rhythm that matches a human voice. Your goal is to sound like a person thinking clearly, not a document being recited.
(1) Conversational syntax
Use shorter clauses (one idea per breath)
Listeners can’t re-read. If a sentence contains multiple commas, parentheticals, or stacked ideas, split it.
- Rule of thumb: one sentence = one main point.
- Technique: turn commas into periods.
Prefer active voice
Active voice is easier to process and usually shorter.
- Active: “I tested three microphones.”
- Passive: “Three microphones were tested by me.”
Choose concrete nouns and verbs
Abstract language sounds like a report. Concrete language paints a picture and feels conversational.
- Abstract: “Optimize your audio capture.”
- Concrete: “Move the mic closer to your mouth.”
Before/after: robotic vs. natural
| Robotic line | Natural line |
|---|---|
| “In this video, we will be discussing the various methodologies for improving vocal delivery.” | “Today I’ll show you a few simple ways to sound better on camera.” |
| “It is important to note that clarity is facilitated by articulation.” | “If you want clarity, slow down and hit the ends of your words.” |
| “This will allow you to achieve an improved outcome.” | “This makes your voice easier to follow.” |
(2) Breath and cadence markers
Great scripts include invisible “music”: pauses, emphasis, and pacing. You can write those cues directly into the script so your delivery stays consistent across takes.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Simple marking system (copy/paste)
/= short pause (beat)//= longer pause (reset)CAPS= emphasis word (don’t shout—stress it)…= trailing thought or suspense(breath)= intentional inhale before a longer sentence
Example: same line, with cadence cues
Plain: “If your audio is echoey you can fix it by moving closer to the mic and adding soft materials to the room.”
Marked: “If your audio is echoey / you can fix it fast. // Move closer to the mic / and add something soft in the room—blanket, curtain, rug.”
Where to place pauses (practical rules)
- Before a key point: pause to create attention.
“Here’s the part most people miss //” - After a number or list: give the brain time to file it.
“Three fixes / in under five minutes.” - After a punchline or strong claim: let it land.
“That one change doubled my retention //”
(3) Reduce jargon and define terms fast
Jargon makes you sound distant and forces the viewer to translate. If you must use a technical term, define it immediately in plain language—ideally in the same sentence.
Jargon reduction ladder
- Remove it: replace with a common word.
- Translate it: keep the term, add a quick meaning.
- Anchor it: add a concrete example.
Examples (term → fast definition)
- “Use a noise gate—that’s a filter that turns down your mic when you’re not talking.”
- “Watch your dynamic range, meaning the gap between your quiet and loud parts.”
- “Add room treatment—basically soft stuff that kills echo.”
Before/after: jargon-heavy vs. viewer-friendly
| Jargon-heavy | Viewer-friendly |
|---|---|
| “Apply compression to reduce dynamic range and increase perceived loudness.” | “Use light compression so your quiet words don’t disappear and your loud words don’t spike.” |
| “Mitigate plosives with a pop filter and off-axis placement.” | “If your P’s sound like explosions, add a pop filter and aim the mic slightly to the side.” |
(4) Use “you” language and guided attention
Spoken scripts work best when they feel like a one-on-one conversation. “You” language pulls the viewer into the moment, and guided attention tells them exactly what to notice next.
Swap “we will” for direct guidance
- Less natural: “We will now look at the settings.”
- More natural: “Look at this setting.”
- Even better: “Look at this setting—see the slider on the left?”
Guided attention phrases (use sparingly)
- Point: “Notice this…” “Look at…” “Watch what happens when…”
- Zoom: “The key word is…” “This part matters…”
- Compare: “Listen to the difference…” “Here’s the before / and here’s the after…”
- Confirm: “If you’re hearing X, that’s normal.”
Micro-structure for clarity (say what, why, do)
For any instruction, keep the order listener-friendly:
- What: “Move the mic closer.”
- Why: “It makes your voice louder than the room.”
- Do: “Aim for about a fist’s distance.”
Before/after: distant vs. conversational
| Distant | Conversational |
|---|---|
| “Users should ensure that their recording environment is optimized.” | “If your room is echoey, you’ll hear it right away—so let’s fix the room first.” |
| “One may observe improved clarity by adjusting settings.” | “You’ll hear the clarity jump as soon as you tweak this one setting.” |
(5) Avoid filler while still sounding human
Filler words can make you sound casual, but too many weaken authority and slow pacing. The trick is to keep human texture without verbal clutter.
Common filler to cut (or replace)
- Weak openers: “So,” “Basically,” “I mean,” “Kind of,” “Sort of”
- Hedges: “Maybe,” “Probably,” “I think” (unless you truly mean uncertainty)
- Empty phrases: “At the end of the day,” “In terms of,” “It’s important to note”
Human alternatives that still sound clean
- Use a pause instead of “um/so”: write
/and breathe. - Use a simple reset phrase: “Here’s the thing.” “Quick example.” “Try this.”
- Use one intentional aside (max): “And yes, this works on a phone mic too.”
Before/after: filler-heavy vs. clean-human
| Filler-heavy | Clean-human |
|---|---|
| “So basically what you want to do is kind of move the mic, like, closer.” | “Move the mic closer. / That’s the fastest fix.” |
| “I mean, it’s important to note that you should probably test it.” | “Test it. / Record ten seconds and listen back.” |
Speak-test procedure (make it sound right out loud)
Do this after your draft is “done.” The goal is to find lines that look fine on the page but fail in the mouth.
Step-by-step
- Read aloud at performance speed (not slow proofreading speed). Record yourself on your phone.
- Mark tongue-twisters and mouth-stumbles. Highlight any line where you trip, run out of breath, or lose your place.
- Simplify the sentence:
- Cut extra clauses.
- Swap complex words for shorter ones.
- Move the main verb earlier.
- Add emphasis cues: mark 1–3 words per paragraph that carry meaning (CAPS or
*asterisks*). - Add breath points: insert
/or//where you naturally inhale or where the viewer needs a beat. - Re-read and re-record. If you still stumble, split the line again.
What to mark (quick legend)
[TT] = tongue-twister (rewrite) / = short pause // = long pause CAPS = emphasis (stress) (breath) = planned inhaleExample: speak-test rewrite
Draft: “To improve intelligibility, prioritize consonant articulation and reduce reverberation in your recording environment.”
Speak-test notes: [TT] “intelligibility” “prioritize consonant articulation” (too formal, too long)
Rewrite: “To sound clearer / hit your consonants. // And kill the echo in your room.”
Checklist: readability and mouth-feel
- First-listen clarity: Does each sentence make sense without re-reading?
- Sentence length: Are most sentences under ~15–20 words?
- One idea per line: Did you split stacked thoughts into separate sentences?
- Active voice: Did you replace “is/was done” with a doer and an action?
- Concrete language: Did you use specific nouns (mic, room, slider) instead of abstractions (optimization, methodology)?
- Jargon control: If a technical term appears, is it defined immediately in plain language?
- Guided attention: Do you tell the viewer what to look at, listen for, or notice?
- Breath points: Are there pauses where you naturally inhale and where key points need space?
- Emphasis cues: Did you mark the few words that must land?
- Filler discipline: Did you remove “basically/just/kind of” unless it adds intentional tone?
- Mouth-feel: Any line you stumble on twice gets rewritten, not rehearsed.