Capstone Overview: A Repeatable Pipeline (Milestones + Checkpoints)
This capstone is a single end-to-end workflow you can repeat for any short-form video. You will move through six milestones in order: assembly cut → pacing pass → caption pass → style pass → polish pass → export pass. Each milestone has a clear “definition of done” and a troubleshooting checkpoint so you can diagnose issues early instead of fixing everything at the end.
Project goal: turn raw clips (A-roll + optional B-roll) into a polished vertical short that uses captions and templates consistently, stays on-beat, and feels platform-native.
What you need before you start
- Raw clips: at least 1 main talking clip (or voiceover) + 3–8 supporting clips (B-roll) if available.
- One music track (or a beat loop) that matches the mood.
- A simple “brand kit” decision: 1 font family, 2 text sizes (title + captions), 1 accent color.
Milestone definitions (so you know when to move on)
| Milestone | Output | Definition of done |
|---|---|---|
| Assembly cut | Full story on timeline | All usable clips placed in order; nothing is “missing,” even if it’s messy. |
| Pacing pass | Tight, beat-aware structure | Hook lands fast; dead air removed; major cuts align with beat or phrasing. |
| Caption pass | Accurate, readable captions | Captions match spoken words; line breaks and emphasis are intentional. |
| Style pass | Consistent templates + text system | Captions + titles share consistent style; templates reused, not reinvented. |
| Polish pass | Subtle effects, reframes, color, audio | Enhancements are felt, not noticed; nothing distracts from the message. |
| Export pass | Platform-ready deliverables | Correct aspect ratio, safe margins, loudness balance, no clipped text. |
Pipeline Step 0: Setup Decisions (Before You Touch the Timeline)
1) Choose your aspect ratio and safe layout
Decide the target first so you don’t fight framing later. For vertical shorts, work in 9:16. Keep critical text and faces away from UI overlays (top and bottom areas often get covered by platform controls).
- Rule of thumb: keep captions in the lower-middle, not hugging the bottom edge.
- Framing rule: eyes roughly in the upper third for talking head.
2) Pick a “caption system” and a “title system”
To stay consistent, decide now:
- Caption style: size, weight, background/box, highlight color.
- Title template: one reusable intro title or label style (e.g., “3 Tips”, “Mistake #1”).
This prevents the common capstone failure: making style decisions repeatedly and inconsistently.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Milestone 1: Assembly Cut (Get the Whole Story Down)
Goal
Build a complete rough version from start to finish. Do not chase perfection yet.
Step-by-step
- Import media (A-roll, B-roll, music, SFX if any) and place them in the media bin so you can find them quickly.
- Lay down A-roll first: put the main speaking clip(s) in order. If you have multiple takes, choose the clearest take per section.
- Mark the hook: identify the strongest 1–2 seconds that earns attention. Place it at the start even if it originally happened later.
- Fill gaps with B-roll placeholders: drop B-roll above A-roll where it supports what’s being said (even if timing is rough).
- Add music to the bottom track and trim it to cover the full timeline length (no fine mixing yet).
Definition of done
- You can watch from start to end and understand the message.
- No missing sections, even if pacing is slow.
- B-roll is roughly placed where it belongs.
Troubleshooting checkpoint (Assembly)
- Problem: The story feels unclear.
Fix: Write a one-sentence promise for the viewer (what they’ll get). If a clip doesn’t support that promise, move it to a “maybe” area or remove it. - Problem: You have too much footage and feel stuck.
Fix: Limit yourself to 3 core points (or 1 core idea). Create a “parking lot” track for extra clips you might reuse later. - Problem: Hook feels weak.
Fix: Try one of these hook patterns: “Stop doing X,” “Here’s the fastest way to Y,” “I tested X so you don’t have to.” Replace the first line of A-roll accordingly.
Milestone 2: Pacing Pass (Tighten + Sync to Beat)
Goal
Make the short feel intentional: fast enough to hold attention, but not rushed. This is where you remove friction.
Step-by-step
- Trim dead air aggressively: remove pauses, repeated words, and long breaths that slow momentum.
- Compress the setup: if your explanation starts with context, reduce it to one line and move into the value quickly.
- Beat-aware alignment: nudge key cuts (hook, point transitions, final payoff) to land on strong beats or musical changes.
- Use micro-structure: break the short into visible sections (Hook → Point 1 → Point 2 → Payoff). Even without titles, the rhythm should signal transitions.
- Check pacing with a “mute test”: mute audio and watch. If it still feels dynamic and understandable visually, pacing is likely strong.
Definition of done
- Hook lands quickly (typically within the first 1–2 seconds).
- Each section moves forward; no “hanging” moments.
- Major transitions feel aligned to beat or speech phrasing.
Troubleshooting checkpoint (Pacing)
- Problem: It feels jumpy or chaotic.
Fix: Reduce cut frequency during key explanations; let one shot breathe for 1–2 beats. Use B-roll to smooth visual continuity. - Problem: It feels slow even after trimming.
Fix: Move the payoff earlier, then support it with quick proof/examples. Consider removing one point entirely. - Problem: Beat sync is fighting the speech.
Fix: Prioritize speech clarity. Sync only the section transitions and B-roll swaps; don’t force every cut to the beat.
Milestone 3: Caption Pass (Accuracy + Intentional Emphasis)
Goal
Captions should be accurate, readable, and timed to help comprehension. This pass is about correctness and structure, not decoration.
Step-by-step
- Generate auto-captions for the primary spoken track.
- Correct errors first: names, numbers, brand terms, and key verbs (these carry meaning).
- Fix segmentation: adjust where captions break so each line is a complete thought (avoid splitting “not” from the verb, or separating numbers from units).
- Emphasis plan: choose 1–2 words per sentence to emphasize (via highlight color, bold weight, or slightly larger size). Keep it consistent.
- Timing pass: ensure captions appear slightly before the word is spoken (subtle lead) and disappear cleanly at phrase ends.
Definition of done
- Captions are accurate and not distracting.
- Line breaks look intentional and help reading speed.
- Emphasis is consistent and not overused.
Troubleshooting checkpoint (Captions)
- Problem: Captions cover important visuals or UI overlays.
Fix: Raise the caption block upward and reduce line count by tightening segmentation. Consider a semi-transparent background box for contrast. - Problem: Captions feel too fast to read.
Fix: Shorten phrases on screen (split long sentences into two caption events). Remove filler words from captions if needed while keeping meaning. - Problem: Emphasis looks random.
Fix: Use a rule: emphasize only the “action word” (verb) or the “result” (outcome), not both.
Milestone 4: Style Pass (Templates + Consistent Visual System)
Goal
Apply a cohesive look using templates and text styles without turning the edit into a design project. The viewer should feel consistency across captions, titles, and callouts.
Step-by-step
- Apply your chosen caption style across the full timeline (font, size, stroke/box, highlight color).
- Add a simple title template for the hook or the first section (e.g., a clean header that frames the topic).
- Use one callout template for key moments (e.g., “Do this”, “Avoid this”, “Step 1”). Reuse the same template rather than mixing multiple.
- Check spacing and alignment: keep consistent margins from edges and consistent vertical placement for captions.
- Consistency sweep: scan the timeline for any text that deviates (font changes, mismatched colors, inconsistent capitalization).
Definition of done
- Captions and titles look like they belong to the same video.
- Templates are reused consistently (not a different look every 3 seconds).
- Text placement respects safe areas.
Troubleshooting checkpoint (Style)
- Problem: The video looks “too template-y.”
Fix: Reduce template frequency. Keep templates for section headers and key callouts only; let captions carry most of the on-screen text. - Problem: Readability changes shot-to-shot (bright backgrounds, busy scenes).
Fix: Add a subtle caption background/box or increase stroke/shadow slightly. Avoid changing colors per shot. - Problem: Titles compete with captions.
Fix: Stagger them: show the title first, then bring captions in after the title clears, or move the title to the top area while captions stay mid-lower.
Milestone 5: Polish Pass (Subtle Effects, Keyframe Reframes, Color, Audio)
Goal
Enhance clarity and professionalism without overprocessing. This pass is where you make the edit feel “finished” while staying clean.
Step-by-step
- Subtle effects only where they serve a purpose: apply light sharpening/clarity, gentle vignette, or minimal motion blur only if it improves focus or smoothness.
- Keyframe reframes: fix framing issues (subject drifting, awkward headroom) and add gentle push-ins on key lines. Keep motion slow and motivated.
- Simple color corrections: match clips so skin tones and exposure feel consistent across cuts. Avoid dramatic shifts between A-roll and B-roll unless intentional.
- Audio finalize: ensure voice is consistently intelligible; music supports without masking. Add subtle SFX only to reinforce transitions or on-screen actions.
- Polish watch-through: watch full-screen, then watch on a phone-sized preview. Note anything that feels “loud” visually (too much motion, too much glow, too many effects).
Definition of done
- Reframes feel natural; no distracting jumps in composition.
- Color feels consistent; nothing looks accidentally tinted or dim.
- Voice is clear; music feels controlled and intentional.
Troubleshooting checkpoint (Polish)
- Problem: Keyframed motion feels shaky or seasick.
Fix: Reduce the distance of the move, lengthen the duration, and avoid stacking multiple motions at once (zoom + pan + rotation). - Problem: Effects look “crunchy” or artificial.
Fix: Halve the intensity. If you can clearly notice the effect, it’s probably too strong for a clean short. - Problem: Audio pumps or music fights the voice.
Fix: Lower music during speech sections and keep transitions smooth. If needed, simplify: fewer SFX, fewer volume changes. - Problem: Color shifts between cuts are obvious.
Fix: Pick one “reference” clip (best exposure/skin tone) and match others to it. Avoid mixing multiple looks.
Milestone 6: Export Pass (Platform Fit + Final Checks)
Goal
Deliver a file that looks correct on the target platform: no cropped text, no unexpected borders, and consistent loudness.
Step-by-step
- Safe-area sweep: scrub through and confirm captions/titles never collide with platform UI zones.
- Quality sweep: check for accidental low-res clips, heavy compression artifacts, or blurry reframes.
- Final timing sweep: confirm the first second is strong and the ending doesn’t linger.
- Export using your platform-ready settings and naming convention (include platform + version).
Troubleshooting checkpoint (Export)
- Problem: Captions look smaller after upload.
Fix: Increase caption size slightly and keep them away from the bottom edge. Test with a private upload if possible. - Problem: Video looks darker on phone.
Fix: Slightly raise exposure/midtones in the edit and re-export; avoid extreme contrast. - Problem: Text appears soft or fuzzy.
Fix: Ensure you’re exporting at a high enough resolution for 9:16 and avoid scaling text layers excessively.
Self-Review Rubric (Score Yourself Before Publishing)
Use this rubric to evaluate the short quickly and objectively. Score each category 1–5 and write one fix if you score 3 or below.
| Category | 5 (Excellent) | 3 (Okay) | 1 (Needs work) |
|---|---|---|---|
| Clarity | Message is instantly clear; viewer knows what they’ll get | Mostly clear but some lines feel vague | Unclear purpose; viewer must guess the point |
| Pacing | No dead air; energy matches topic; transitions feel intentional | Some slow spots or rushed explanations | Feels dragging or chaotic; hard to follow |
| Readability | Captions readable on phone; good contrast; clean line breaks | Readable but occasionally cramped or fast | Too small, low contrast, or poorly timed |
| Consistency | Templates, fonts, colors, and placement are uniform | Minor inconsistencies that don’t ruin it | Mixed styles; looks patched together |
| Platform fit | Hook + layout feel native; safe areas respected | Mostly fits but could be more tailored | Text/UI collisions; pacing mismatched to platform |
Quick self-review checklist (fast pass)
- Hook in the first 1–2 seconds: yes/no
- One clear promise: yes/no
- Captions never touch the bottom edge: yes/no
- Only 1–2 template styles used: yes/no
- No effect draws attention to itself: yes/no
- Voice always understandable over music: yes/no
Platform Variants (Same Project, Two Optimizations)
Variant A: TikTok Version (Fast hook, larger captions)
Goal: maximize immediate retention and readability in a fast-scrolling feed.
- Hook strategy: start with the most surprising result or the “don’t do this” warning. Consider opening on a punchy B-roll shot with the key phrase on screen.
- Pacing adjustments: shorten pauses further; tighten transitions between points; keep sections compact.
- Captions: increase caption size; use stronger emphasis (highlight color) but keep it consistent. Keep captions slightly higher to avoid UI overlays.
- Titles: use a bold, simple header template for the first beat only (then let captions carry).
- Visual rhythm: more frequent B-roll swaps and reframes, but keep motion subtle to avoid looking noisy.
Suggested naming: ProjectName_TikTok_v1Variant B: YouTube Shorts Version (Calmer pacing, cleaner titles)
Goal: maintain clarity and a slightly more “clean editorial” feel while staying engaging.
- Hook strategy: still quick, but allow one extra beat for context if it improves understanding.
- Pacing adjustments: fewer rapid-fire cuts during explanations; let key lines breathe slightly longer.
- Captions: slightly smaller than TikTok version, prioritizing clean line breaks and consistent placement.
- Titles: cleaner title template (less decoration), possibly a short top title that doesn’t compete with captions.
- Polish preference: reduce effect intensity; prioritize stable reframes and consistent color.
Suggested naming: ProjectName_Shorts_v1