Free Course Image Computer Vision: From Image Formation to Neural Radiance Fields

Free online courseComputer Vision: From Image Formation to Neural Radiance Fields

Duration of the online course: 21 hours and 59 minutes

New

Build real computer vision skills in a free online course, from image formation to NeRFs—learn 3D reconstruction, segmentation, and modern ML with exercises.

In this free course, learn about

  • Core goals, scope, and history of computer vision; why it is an ill-posed inverse problem
  • Geometric image formation: projective transforms, homographies (min 4 correspondences), camera models
  • Photometric formation: rendering equation (hemisphere integral), reflectance, and sensing pipeline basics
  • Feature matching & SfM: SIFT ratio test meaning, essential matrix, factorization rank-3, bundle adjustment
  • Stereo depth: rectification, disparity-to-depth, 1D epipolar search, and learned/end-to-end stereo matching
  • Spatial regularization for stereo via MRFs: smoothness benefits vs winner-take-all; soft-argmin disparity
  • Graphical models: MRFs vs factor graphs, partition function, belief propagation efficiency on trees
  • Graphical-model applications: denoising (alpha), stereo (lambda), multiview intensity model, optical flow vs stereo
  • Learning in structured models: CRFs vs MRFs, log-linear parameter gradients, unrolled inference in deep models
  • Shape-from-X: shape-from-shading assumptions, photometric stereo needs 3+ lights, structured light advantage
  • 3D fusion & implicit reps: TSDF/SDF volumetric fusion; occupancy nets; differentiable rendering; NeRF/GRAF ideas
  • Recognition: classification, semantic segmentation (pixel labels), dilated convs, detection/segm and IoU metric
  • Self-supervised CV: pretext vs contrastive learning; photometric reprojection for depth+pose without labels
  • Diverse topics: input optimization, superquadric sparsity, SMPL pose/shape blendshapes, deepfakes overview

Course Description

Turn images into measurable geometry, interpretable models, and intelligent predictions. This free online course guides you through computer vision from first principles of image formation to modern neural methods such as Neural Radiance Fields (NeRFs). Instead of treating vision as a black box, you will learn why it is fundamentally an inverse problem: the world causes images, and vision must reason backward under ambiguity, noise, and missing information. That mindset connects classic geometry with today’s learning-based systems and helps you build intuition that transfers across projects.

You will develop a strong foundation in how cameras form images, how sensing choices affect what data you can trust, and how geometric transformations relate views of the same scene. From there, you’ll progress to 3D understanding: matching features across images, reasoning with epipolar constraints, and scaling reconstruction with optimization techniques used in practice. You’ll also see how stereo depth is computed in rectified settings, why local matching often fails, and how spatial regularization improves consistency when estimating dense disparity.

A key theme is structured prediction. You’ll explore probabilistic graphical models, Markov random fields, and factor graphs as tools for expressing dependencies in vision problems like denoising, stereo, and optical flow. That perspective makes it clearer how inference works, what the partition function is doing, and why message passing can be efficient when the structure permits it. You’ll then connect these ideas to learning: conditional random fields, parameter estimation, and modern deep structured approaches that unroll inference into trainable networks.

Beyond geometry and graphs, the course bridges into shape-from-X methods and multi-view fusion, helping you understand how light, reflectance assumptions, and sensor cues influence 3D recovery. You’ll then step into implicit neural representations and differentiable volumetric rendering, where models learn 3D structure and appearance by backpropagating through the rendering process from RGB images. NeRFs and generative radiance field ideas show how view synthesis and reconstruction converge in contemporary machine learning.

To round out your toolkit, you’ll work through recognition tasks such as classification, semantic segmentation, and object detection, and study self-supervised and contrastive learning for reducing reliance on labels. By the end, you’ll be prepared to read research with confidence, prototype robust pipelines, and speak the language of both classical vision and modern AI—useful for careers in machine learning engineering, robotics, AR/VR, and 3D content.

Course content

  • Video class: Computer Vision - Lecture 1.1 (Introduction: Organization) 05m
  • Exercise: What is a central goal of computer vision as described in the course introduction?
  • Video class: Computer Vision - Lecture 1.2 (Introduction: Introduction) 26m
  • Exercise: Why is computer vision often described as an ill-posed inverse problem?
  • Video class: Computer Vision - Lecture 1.3 (Introduction: History of Computer Vision) 1h03m
  • Exercise: Which development is highlighted as enabling large-scale Structure-from-Motion reconstructions from many Internet photos in the 2000s?
  • Video class: Computer Vision - Lecture 2.1 (Image Formation: Primitives and Transformations) 52m
  • Exercise: How many point correspondences are the minimum needed to estimate a 2D homography (projective transformation), and why?
  • Video class: Computer Vision - Lecture 2.2 (Image Formation: Geometric Image Formation) 34m
  • Video class: Computer Vision - Lecture 2.3 (Image Formation: Photometric Image Formation) 23m
  • Exercise: In the rendering equation, why is there an integral over the hemisphere of incoming directions?
  • Video class: Computer Vision - Lecture 2.4 (Image Formation: Image Sensing Pipeline) 12m
  • Exercise: What is the main purpose of the shutter speed (exposure time) in the image sensing pipeline?
  • Video class: Computer Vision - Lecture 3.1 (Structure-from-Motion: Preliminaries) 26m
  • Exercise: In feature matching with SIFT descriptors, what does a large ratio between the best and second-best nearest-neighbor distances (e.g., around 0.8) indicate?
  • Video class: Computer Vision - Lecture 3.2 (Structure-from-Motion: Two-frame Structure-from-Motion) 33m
  • Exercise: In two-view epipolar geometry, what does the essential matrix express?
  • Video class: Computer Vision - Lecture 3.3 (Structure-from-Motion: Factorization) 22m
  • Exercise: In Tomasi–Kanade factorization (under orthographic projection with centered measurements), what is the key rank property of the measurement matrix?
  • Video class: Computer Vision - Lecture 3.4 (Structure-from-Motion: Bundle Adjustment) 30m
  • Video class: Computer Vision - Lecture 4.1 (Stereo Reconstruction: Preliminaries) 44m
  • Exercise: In rectified binocular stereo, how is depth (c) computed from disparity (d)?
  • Video class: Computer Vision - Lecture 4.2 (Stereo Reconstruction: Block Matching) 22m
  • Exercise: In rectified stereo block matching, where do you search for a corresponding patch for a pixel in the left image?
  • Video class: Computer Vision - Lecture 4.3 (Stereo Reconstruction: Siamese Networks) 17m
  • Exercise: In learned stereo block matching with siamese networks, why is the cosine-similarity architecture much faster at inference than the learned-similarity (MLP) architecture?
  • Video class: Computer Vision - Lecture 4.4 (Stereo Reconstruction: Spatial Regularization) 14m
  • Exercise: What is the main benefit of adding spatial regularization via a Markov Random Field (MRF) to disparity estimation compared to a local winner-takes-all approach?
  • Video class: Computer Vision - Lecture 4.5 (Stereo Reconstruction: End-to-End Learning) 15m
  • Exercise: In GC-Net style stereo matching, how is the final disparity commonly obtained from the predicted per-disparity matching costs?
  • Video class: Computer Vision - Lecture 5.1 (Probabilistic Graphical Models: Structured Prediction) 20m
  • Exercise: In stereo matching, what is the main purpose of spatial regularization when using a graphical model?
  • Video class: Computer Vision - Lecture 5.2 (Probabilistic Graphical Models: Markov Random Fields) 32m
  • Exercise: In a Markov random field, what is the role of the partition function?
  • Video class: Computer Vision - Lecture 5.3 (Probabilistic Graphical Models: Factor Graphs) 08m
  • Exercise: What key property defines a factor graph compared to a Markov random field (MRF) graph representation?
  • Video class: Computer Vision - Lecture 5.4 (Probabilistic Graphical Models: Belief Propagation) 33m
  • Exercise: What is the key idea that makes belief propagation (sum-product) efficient on chain or tree factor graphs compared to naive marginalization?
  • Video class: Computer Vision - Lecture 5.5 (Probabilistic Graphical Models: Examples) 13m
  • Exercise: In the image denoising Markov random field example, what is the role of the parameter alpha in the pairwise smoothness term?
  • Video class: Computer Vision - Lecture 6.1 (Applications of Graphical Models: Stereo Reconstruction) 17m
  • Exercise: In an MRF model for stereo disparity estimation on a 4-connected grid, what does the regularization weight (λ) primarily control?
  • Video class: Computer Vision - Lecture 6.2 (Applications of Graphical Models: Multi-View Reconstruction) 37m
  • Exercise: In the probabilistic multiview reconstruction model, how is the intensity of a pixel (ray) explained in the image formation process?
  • Video class: Computer Vision - Lecture 6.3 (Applications of Graphical Models: Optical Flow) 46m
  • Exercise: What is a key difference between stereo matching and optical flow estimation?
  • Video class: Computer Vision - Lecture 7.1 (Learning in Graphical Models: Conditional Random Fields) 18m
  • Exercise: What is the key change when moving from a Markov Random Field (MRF) to a Conditional Random Field (CRF) for learning?
  • Video class: Computer Vision - Lecture 7.2 (Learning in Graphical Models: Parameter Estimation) 47m
  • Exercise: When learning parameters of a log-linear Conditional Random Field (CRF), what does the gradient of the negative conditional log-likelihood correspond to?
  • Video class: Computer Vision - Lecture 7.3 (Learning in Graphical Models: Deep Structured Models) 26m
  • Exercise: What is a key idea behind unrolled inference in deep structured models?
  • Video class: Computer Vision - Lecture 8.1 (Shape-from-X: Shape-from-Shading) 56m
  • Exercise: Which combination of assumptions most directly leads to the simplified reflectance map used in classical shape-from-shading?
  • Video class: Computer Vision - Lecture 8.2 (Shape-from-X: Photometric Stereo) 20m
  • Exercise: In basic Lambertian photometric stereo, why are at least three images with different known light directions needed (same camera viewpoint)?
  • Video class: Computer Vision - Lecture 8.3 (Shape-from-X: Shape-from-X) 09m
  • Exercise: What is a key advantage of structured light over classical passive stereo in textureless regions?
  • Video class: Computer Vision - Lecture 8.4 (Shape-from-X: Volumetric Fusion) 37m
  • Exercise: In volumetric fusion, how are multiple depth-map-derived signed distance fields (SDFs) combined into one consistent scene reconstruction?
  • Video class: Computer Vision - Lecture 9.1 (Coordinate-based Networks: Implicit Neural Representations) 45m
  • Exercise: In an occupancy network, what does the neural network primarily learn to represent as the 3D surface?
  • Video class: Computer Vision - Lecture 9.2 (Coordinate-based Networks: Differentiable Volumetric Rendering) 28m
  • Exercise: What key technique enables training implicit 3D geometry and appearance models using only RGB images (without 3D supervision) by backpropagating through the rendering step?
  • Video class: Computer Vision - Lecture 9.3 (Coordinate-based Networks: Neural Radiance Fields) 17m
  • Exercise: In Neural Radiance Fields (NeRF), what is the primary objective compared to classic 3D reconstruction?
  • Video class: Computer Vision - Lecture 9.4 (Coordinate-based Networks: Generative Radiance Fields) 20m
  • Exercise: What key idea makes training Generative Radiance Fields (GRAF) feasible using only unposed 2D image collections?
  • Video class: Computer Vision - Lecture 10.1 (Recognition: Image Classification) 57m
  • Exercise: Which task assigns a semantic label to every pixel in an image (including both objects like cars and stuff like sky)?
  • Video class: Computer Vision - Lecture 10.2 (Recognition: Semantic Segmentation) 16m
  • Exercise: What is the key idea behind using dilated convolutions for semantic segmentation?
  • Video class: Computer Vision - Lecture 10.3 (Recognition: Object Detection and Segmentation) 42m
  • Exercise: In object detection evaluation, what does Intersection over Union (IoU) measure for two 2D bounding boxes?
  • Video class: Computer Vision - Lecture 11.1 (Self-Supervised Learning: Preliminaries) 22m
  • Exercise: What is the core idea of self-supervised learning in computer vision?
  • Video class: Computer Vision - Lecture 11.2 (Self-Supervised Learning: Task-specific Models) 27m
  • Exercise: In self-supervised monocular depth + ego-motion learning, what is minimized to train the depth network and pose network without ground-truth labels?
  • Video class: Computer Vision - Lecture 11.3 (Self-Supervised Learning: Pretext Tasks) 25m
  • Exercise: In self-supervised learning with pretext tasks, what is the primary goal of the pretext task (e.g., rotation prediction) during pre-training?
  • Video class: Computer Vision - Lecture 11.4 (Self-Supervised Learning: Contrastive Learning) 30m
  • Exercise: What is a key limitation of classical pretext tasks (e.g., solving jigsaw puzzles) for self-supervised pre-training?
  • Video class: Computer Vision - Lecture 12.1 (Diverse Topics in Computer Vision: Input Optimization) 36m
  • Exercise: In input optimization, what is updated to achieve goals like adversarial attacks or neural style transfer?
  • Video class: Computer Vision - Lecture 12.2 (Diverse Topics in Computer Vision: Compositional Models) 24m
  • Exercise: In unsupervised 3D shape abstraction with superquadrics, what is the key role of the parsimony (sparsity) loss term?
  • Video class: Computer Vision - Lecture 12.3 (Diverse Topics in Computer Vision: Human Body Models) 34m
  • Exercise: What key modification does SMPL add to basic Linear Blend Skinning (LBS) to reduce joint artifacts (e.g., at the elbow)?
  • Video class: Computer Vision - Lecture 12.4 (Diverse Topics in Computer Vision: Deepfakes) 17m

This free course includes:

21 hours and 59 minutes of online video course

Digital certificate of course completion (Free)

Exercises to train your knowledge

100% free, from content to certificate

Ready to get started?Download the app and get started today.

Install the app now

to access the course
Icon representing technology and business courses

Over 5,000 free courses

Programming, English, Digital Marketing and much more! Learn whatever you want, for free.

Calendar icon with target representing study planning

Study plan with AI

Our app's Artificial Intelligence can create a study schedule for the course you choose.

Professional icon representing career and business

From zero to professional success

Improve your resume with our free Certificate and then use our Artificial Intelligence to find your dream job.

You can also use the QR Code or the links below.

QR Code - Download Cursa - Online Courses

More free courses at Artificial Intelligence and Machine Learning

Free Ebook + Audiobooks! Learn by listening or reading!

Download the App now to have access to + 5000 free courses, exercises, certificates and lots of content without paying anything!

  • 100% free online courses from start to finish

    Thousands of online courses in video, ebooks and audiobooks.

  • More than 60 thousand free exercises

    To test your knowledge during online courses

  • Valid free Digital Certificate with QR Code

    Generated directly from your cell phone's photo gallery and sent to your email

Cursa app on the ebook screen, the video course screen and the course exercises screen, plus the course completion certificate