The Speech System: Air, Vibration, and Shaping
Every speech sound you produce is built from the same basic ingredients: airflow (air moving out of the lungs), voicing (vibration of the vocal folds), and articulation (the way the tongue, lips, jaw, and other parts shape that airflow into recognizable sounds). Understanding this “sound-making pipeline” helps you diagnose pronunciation issues more precisely: if a sound is unclear, you can ask whether the problem is the air, the voice, or the shaping.
Speech is usually made on an outgoing breath (egressive pulmonic airflow). You don’t need a large breath for each word; instead, you manage a steady stream of air and then create different sound types by changing what happens in the throat and mouth. Some sounds require a strong burst of air, some require continuous airflow, and some require airflow through the nose.
Three core stages
Power (lungs and breath control): The lungs push air upward through the windpipe. The amount and steadiness of air affects loudness and clarity.
Source (larynx/voice box): The vocal folds can vibrate (voiced sounds) or stay open (voiceless sounds). They can also close briefly to stop air (glottal stop) or partially narrow to create friction.
Filter (vocal tract shaping): The throat, mouth, and nose act like a flexible “tube.” By moving the tongue, lips, jaw, and soft palate, you create different consonants and vowels.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
The Main “Parts” You Use to Speak
To improve pronunciation, it helps to know what each part does and what it should feel like. You don’t need medical detail; you need practical control cues.
Lungs and diaphragm: managing airflow
Your lungs provide the air. The diaphragm and surrounding muscles help regulate the pressure. For clear speech, aim for a steady, controlled stream rather than pushing hard. Over-pushing often causes tense consonants, strained voice, or unstable vowels.
Larynx and vocal folds: voicing and pitch
The vocal folds sit in the larynx. When they vibrate, you get voicing (a buzzing sensation often felt in the throat or chest). When they are apart, air passes without vibration, producing voiceless sounds.
You can test voicing by placing two fingers lightly on your throat (Adam’s apple area) and alternating pairs like /s/ vs /z/ or /f/ vs /v/. The voiced sound should create a clear buzz; the voiceless sound should not.
Soft palate (velum): oral vs nasal airflow
The soft palate is the “gate” between the mouth and the nose. When it lifts, air goes out through the mouth (most English sounds). When it lowers, air can flow through the nose, creating nasal sounds like /m/, /n/, and /ŋ/ (as in “sing”).
A quick self-check: say “ssss” and pinch your nose. The sound should not change much (oral sound). Then say “mmmm” and pinch your nose; the sound should stop or change strongly (nasal sound).
Tongue: the main shaper
The tongue is the most important articulator. Different parts of the tongue can contact or approach different places in the mouth:
Tip (apex): used for sounds like /t d n l/ in many accents.
Blade: area just behind the tip, often used for /s z ʃ ʒ/.
Front: important for many vowels and sounds like /j/ (“yes”).
Back: used for /k g ŋ/ and back vowels.
Lips and jaw: opening, rounding, and closure
The lips can close (as in /p b m/), narrow (as in /w/), or round/spread to shape vowels. The jaw controls how open the mouth is, which strongly affects vowel quality. Many unclear vowels come from a jaw that is too closed or too tense.
Teeth and alveolar ridge: contact points
Your teeth and the ridge just behind the upper front teeth (the alveolar ridge) are common contact points. Sounds like /t d n l s z/ often involve the tongue touching or approaching the alveolar ridge.
Voiced vs Voiceless: The “On/Off” Switch
Many English consonant pairs differ mainly by voicing. If you can control voicing, you can fix a large set of pronunciation problems quickly.
Common voicing pairs
/p/ vs /b/ (pat/bat)
/t/ vs /d/ (two/do)
/k/ vs /g/ (coat/goat)
/f/ vs /v/ (fan/van)
/s/ vs /z/ (sip/zip)
/ʃ/ vs /ʒ/ (ship/measure)
/tʃ/ vs /dʒ/ (cheap/jeep)
Step-by-step: training voicing control
Step 1: Put your fingers lightly on your throat.
Step 2: Sustain a voiceless sound: “ssssss”. Feel no buzz.
Step 3: Sustain the voiced partner: “zzzzzz”. Feel the buzz.
Step 4: Switch slowly: ssss → zzzz → ssss → zzzz. Keep the mouth shape similar; change only the voice.
Step 5: Add a vowel: “see–zee–see–zee”. Aim for a clean contrast.
This exercise teaches you to separate “larynx work” (voicing) from “mouth work” (articulation). That separation is a powerful diagnostic tool.
How Consonants Are Made: Place and Manner
Consonants are created by narrowing or blocking airflow somewhere in the vocal tract. Two questions define a consonant:
Place of articulation: Where is the main narrowing or closure?
Manner of articulation: How is the airflow modified (stopped, squeezed, released, etc.)?
Stops (plosives): complete closure + release
Stops are made by fully blocking airflow, building pressure, then releasing it. English stops include /p b t d k g/.
Lips (bilabial): /p b/ as in “pat/bat”
Tongue at alveolar ridge: /t d/ as in “two/do”
Back of tongue at soft palate (velar): /k g/ as in “coat/goat”
Step-by-step: making a clean stop
Step 1: Form the closure (for /t/, tongue tip to alveolar ridge).
Step 2: Keep voicing off for /t/ (voiceless) and on for /d/ (voiced).
Step 3: Hold the closure briefly (don’t over-hold; English stops are quick).
Step 4: Release sharply into a vowel: “ta”, “da”.
If your stop sounds weak, you may be leaking air (incomplete closure). If it sounds too harsh, you may be over-pressurizing or tensing the jaw.
Fricatives: narrow channel + continuous friction
Fricatives are made by bringing two articulators close enough to create friction as air passes through. English fricatives include /f v θ ð s z ʃ ʒ h/.
/f v/: lower lip against upper teeth (“fan/van”)
/θ ð/: tongue tip near or lightly between teeth (“thin/this”)
/s z/: tongue near alveolar ridge (“sip/zip”)
/ʃ ʒ/: tongue slightly farther back with lip rounding (“ship/measure”)
/h/: friction at the larynx with open mouth shape (“hat”)
Step-by-step: stabilizing a fricative
Step 1: Choose a fricative, e.g., /s/. Smile slightly to keep the lips from rounding too much.
Step 2: Make a narrow groove with the tongue (don’t press hard).
Step 3: Blow steady air: “ssssss”. Keep the sound even (no pulsing).
Step 4: Add voicing for the pair: “zzzzzz”. Keep the same tongue position.
Step 5: Attach vowels: “see, say, so, sue” and “zee, zay, zo, zoo”.
If a fricative sounds “slushy” or unclear, the tongue may be too far back, the lips may be rounding unexpectedly, or the airflow may be too weak.
Affricates: stop + fricative as one unit
Affricates combine a stop closure with a fricative release. English has /tʃ/ (“chip”) and /dʒ/ (“job”). They are not two separate sounds; the timing is tight.
Practice by holding the stop briefly and then releasing into friction without adding a vowel in between: “t…sh” becomes /tʃ/.
Nasals: oral closure + air through the nose
Nasals are made by closing the mouth at some point while lowering the soft palate so air exits through the nose. English nasals are /m n ŋ/.
/m/: lips closed (“man”)
/n/: tongue at alveolar ridge (“no”)
/ŋ/: back of tongue at velum (“sing”)
A common clarity issue is replacing /ŋ/ with /n/ or adding an extra /g/. To feel /ŋ/, sustain it: “ŋŋŋ” (like the end of “sing”) with the tongue back and the lips relaxed.
Approximants: shaping without strong friction
Approximants are made by narrowing the vocal tract, but not enough to create strong friction. English approximants include /r l w j/. These sounds often carry accent differences because small shape changes matter.
/w/: lip rounding + back tongue raised (“we”)
/j/: front tongue raised (“yes”)
/l/: tongue tip contact with airflow around the sides (“light”)
/r/: tongue shape varies by accent; typically a “bunched” or “retroflex” tongue with lip rounding (“red”)
For /l/, focus on the tongue tip touching the alveolar ridge while the sides of the tongue relax to let air pass. For /r/, focus on keeping the tongue from touching the roof of the mouth; the sound is shaped by narrowing, not by contact.
How Vowels Are Made: Tongue Shape + Lip Shape + Resonance
Vowels are produced with an open vocal tract (no complete blockage). The main differences between vowels come from the tongue’s position and shape, lip rounding/spreading, and jaw openness. These changes alter resonance (the “tone color”) of the sound.
Key vowel controls
Tongue height: high (as in “see”) vs low (as in “cat”)
Tongue front/back: front (as in “see”) vs back (as in “food”)
Lip shape: spread (often front vowels) vs rounded (often back vowels)
Jaw openness: more open often creates lower vowels
Vowels can be monophthongs (steady) or diphthongs (gliding from one shape to another). In English, many common vowels involve a glide, meaning your tongue and lips move during the vowel.
Step-by-step: finding a vowel target
Step 1: Choose a reference word, e.g., “see” for a high front vowel.
Step 2: Hold the vowel longer than normal: “seeeee”. This makes the shape easier to feel.
Step 3: Notice tongue position: high and forward. Notice lips: slightly spread.
Step 4: Compare with a contrasting vowel, e.g., “saw” or “sue,” and exaggerate the difference in tongue and lip shape.
Step 5: Return to normal length while keeping the same target shape.
If your vowels sound similar to each other, the tongue may not be moving enough between targets, or the jaw may be “locked” in one position.
Coarticulation: Sounds Influence Each Other
In real speech, you don’t produce sounds one by one like separate beads. Your mouth prepares for the next sound while finishing the current one. This is called coarticulation. It is normal and necessary for fluent speech, but it can also cause confusion if you expect “dictionary-perfect” shapes.
Examples of coarticulation effects
Lip rounding spreads: In “too,” the lips round early because of the following /u/. The /t/ may sound slightly different than in “tea.”
Nasal influence: In “man,” the vowel may become slightly nasalized because the soft palate starts lowering before /n/.
Place adjustments: In fast speech, tongue contact points can shift slightly to maintain speed and ease.
When practicing, it helps to isolate sounds first, then practice them in syllables, then in words, and finally in short phrases. This trains your articulators to coordinate smoothly.
A Practical Diagnostic Routine: Air, Voice, Shape
When a sound is unclear, use a consistent troubleshooting order. This prevents random guessing and helps you fix the real cause.
Step-by-step diagnostic checklist
Step 1: Airflow — Can you sustain the sound type? For fricatives, can you hold steady air (“ssss”)? For stops, can you build and release pressure cleanly (“ta”)?
Step 2: Voicing — Should the sound buzz? Check with your fingers on the throat. If a voiced sound is coming out voiceless (or the opposite), fix voicing before changing tongue position.
Step 3: Place — Where is the main contact or narrowing? Move only one variable at a time (tongue tip forward/back, lip rounding on/off) until the sound matches the target.
Step 4: Manner — Are you stopping the air, letting it hiss, or letting it flow smoothly? For example, if /t/ sounds like /s/, you may be failing to make full closure.
Step 5: Timing — Does the sound connect smoothly to the next vowel or consonant? Many errors are timing errors rather than “wrong position” errors.
Mini Practice Lab: Feeling the Mechanics
Use these short drills to build awareness of how speech sounds are made. Keep them slow and controlled; speed comes later.
Drill 1: Voicing switch with the same mouth shape
ssssss → zzzzzz → ssssss → zzzzzz (steady airflow, only voice changes)Drill 2: Stop release into vowels
pa, pe, pi, po, pu (lips close fully, quick release) ta, te, ti, to, tu (tongue tip to ridge) ka, ke, ki, ko, ku (back tongue contact)Drill 3: Nasal gate control
ssss (pinch nose: no change) mmmm (pinch nose: stops/changes) aaa → mmm → aaa (feel soft palate switching)Drill 4: Fricative clarity and placement
f—v—f—v (lip-teeth contact, add/remove voicing) th(θ)—th(ð) (tongue near teeth, add/remove voicing) s—z—s—z (tongue groove, steady hiss)Drill 5: Vowel shaping with jaw and lips
see (spread) → sue (round) → see → sue cat (open jaw) → cut (more central) → cat → cutAs you practice, aim for repeatable physical cues: where the tongue touches, whether the lips round, whether the throat buzzes, and whether air is steady. These cues are more reliable than trying to “copy” a sound only by listening.