Free Course Image Modern Computer Vision and Deep Learning for Images - CNNs, RNNs, 3D Vision, Detection

Free online courseModern Computer Vision and Deep Learning for Images - CNNs, RNNs, 3D Vision, Detection

Duration of the online course: 31 hours and 33 minutes

New

Free NPTEL course on modern computer vision and deep learning: CNNs, RNNs, 3D vision, segmentation, stereo, SfM, and detection.

In this free course, learn about

  • Course Overview and Deep Learning Foundations
  • Neural Networks: Perceptrons, Training, and Optimization
  • Convolutional Neural Networks and Architectures
  • Recurrent Neural Networks for Vision Sequences
  • Classical Low-Level Vision: Filtering, Edges, and Features
  • Single-View Geometry and Planar Transformations
  • Camera Models, Two-View Geometry, and Epipolar Constraints
  • Structure from Motion and Dense Stereo Reconstruction
  • Mid-Level Vision: Motion and Segmentation
  • Deep Learning for Object Detection and Course Wrap-Up

Course Description

Modern Computer Vision and Deep Learning for Images is a free online course by NPTEL in Technology and Programming, designed for learners who want a solid, end-to-end understanding of how machines interpret images. It bridges classical computer vision foundations with modern deep learning methods, helping you build intuition for both the math and the practical design choices behind today’s vision systems.

You will progress from neural network essentials such as neurons, multilayer perceptrons, regression, training workflows, gradient descent, activation functions, backpropagation, optimization strategies, and regularization techniques including dropout and preprocessing. From there, the course dives into convolutional neural networks, covering core CNN concepts, properties, and influential architectures, then expands into sequence modeling with recurrent networks, encoder-decoder approaches, and LSTMs to connect deep learning ideas across different data structures.

On the classical vision side, it explores low-level image processing topics like spatial and frequency domain filtering, edge and line detection, and feature detection and description with well-known methods such as Harris corners, blob detection, SIFT, and SURF. The geometry and 3D vision portion introduces camera modeling and intrinsics, 2D geometric transformations, two-view stereo, epipolar geometry, fundamental matrix estimation, and structure from motion, including factorization methods, bundle adjustment, and dense 3D reconstruction. Finally, it moves into mid-level vision and modern deep approaches for optical flow, segmentation, clustering with GMMs, and object detection, giving you a broad toolkit for real-world computer vision problems.

Course content

  • Video class: #1 Course Introduction | Part 1 | Modern Computer Vision 18m
  • Exercise: Which option best describes the difference between metric and semantic information extracted from images?
  • Video class: #2 Course Introduction | Part 2 | Modern Computer Vision 28m
  • Video class: #3 Introduction to Deep Learning | Part 1 | Modern Computer Vision 15m
  • Exercise: In a modern vision pipeline, which statement best describes how early CNN layers relate to traditional low-level vision?
  • Video class: #4 Introduction to Deep Learning | Part 2 | Modern Computer Vision 19m
  • Video class: #5 Introduction to Deep Learning | Part 3 | Modern Computer Vision 13m
  • Exercise: What key factor enabled the major performance leap in image classification around 2012, alongside the introduction of AlexNet?
  • Video class: #6 Introduction to Neuron | Part 1 | Modern Computer Vision 11m
  • Video class: #7 Introduction to Neuron | Part 2 | Modern Computer Vision 26m
  • Exercise: Why can a single perceptron (single linear decision boundary) not model the XOR function?
  • Video class: #8 Introduction to Neuron | Part 3 | Modern Computer Vision 15m
  • Video class: #9 Multilayer Perceptron | Modern Computer Vision 24m
  • Exercise: In the perceptron update rule, what change is applied when a positive example is misclassified (i.e., wᵀx < 0)?
  • Video class: #10 Regression 16m
  • Video class: #11 Training a Neural Network | Modern Computer Vision 12m
  • Exercise: Why can the loss surface of a deep network be non-convex even if all activations are linear?
  • Video class: #12 Gradient Descent | Modern Computer Vision 28m
  • Video class: #13 Activation Function | Modern Computer Vision 26m
  • Exercise: Why is the standard sigmoid activation often avoided in early hidden layers of deep networks?
  • Video class: #14 Backpropagation in MLP | Part 1 | Modern Computer Vision 27m
  • Video class: #15 Backpropagation in MLP | Part 2 | Modern Computer Vision 22m
  • Exercise: In backpropagation for a network with loss L = (1/2)∑(ŷᵢ − yᵢ)², what is the gradient of the loss with respect to a bias term Bᵢᴸ (bias feeding into Zᵢᴸ⁺¹)?
  • Video class: #16 Optimization 26m
  • Video class: #17 Optimization 27m
  • Exercise: In adaptive gradient methods like Adagrad, what is the main purpose of dividing the learning rate by a term involving accumulated past squared gradients (e.g., √r_t)?
  • Video class: #18 Regularization | Modern Computer Vision 25m
  • Video class: #19 Dropout | Modern Computer Vision 17m
  • Video class: #20 Pre Processing | Modern Computer Vision 09m
  • Video class: #21 Convolutional Neural Networks | Part 1 | Modern Computer Vision 14m
  • Exercise: In Xavier initialization, from which range are weights typically drawn (uniformly) to help keep activation variance stable across layers?
  • Video class: #22 Convolutional Neural Networks | Part 2 | Modern Computer Vision 17m
  • Video class: #23 Convolutional Neural Networks | Part 3 | Modern Computer Vision 15m
  • Exercise: In a CNN used for digit classification, what does each output channel after a convolution typically represent?
  • Video class: #24 CNN Properties | Modern Computer Vision 30m
  • Video class: #25 Alexnet | Modern Computer Vision 14m
  • Exercise: In AlexNet, where do the majority of learnable parameters (unknown weights) reside?
  • Video class: #26 CNN Architectures | Part 1 | Modern Computer Vision 15m
  • Video class: #27 CNN Architectures | Part 2 | Modern Computer Vision 22m
  • Exercise: In an Inception module, what is the main purpose of using a 1×1 convolution before larger filters like 3×3 and 5×5?
  • Video class: #28 CNN Architectures | Part 3 | Modern Computer Vision 13m
  • Video class: #29 Introduction to RNN | Part 1 | Modern Computer Vision 27m
  • Exercise: Which key property of an RNN enables it to model temporal dependence in sequences?
  • Video class: #30 Introduction to RNN | Part 2 | Modern Computer Vision 19m
  • Video class: #31 Encoder | Decoder | Models in RNN | Modern Computer Vision 27m
  • Exercise: In an encoder–decoder setup for image captioning, which pairing of models is most appropriate for the encoder and decoder?
  • Video class: #32 LSTM | Modern Computer Vision 21m
  • Video class: #33 Low Level Vision | Part 1 | Modern Computer Vision 14m
  • Exercise: Why are local features like corners considered useful in low-level vision?
  • Video class: #34 Low Level Vision | Part 2 | Modern Computer Vision 22m
  • Video class: #35 Low Level Vision | Part 3 | Modern Computer Vision 09m
  • Video class: #36 Spatial Domain Filtering | Modern Computer Vision 26m
  • Video class: #37 Frequency Domain Filtering | Modern Computer Vision 23m
  • Exercise: In frequency-domain filtering, what operation is typically performed to apply a filter mask to an image?
  • Video class: #38 Edge Detection | Part 1 | Modern Computer Vision 23m
  • Video class: #39 Edge Detection | Part 2 | Modern Computer Vision 26m
  • Exercise: In the Canny edge detector, what is the main purpose of non-maxima suppression (NMS)?
  • Video class: #40 DeepNets for Edge Detection | Modern Computer Vision 21m
  • Video class: #41 Line Detection | Modern Computer Vision 27m
  • Exercise: Why does the Hough transform for line detection prefer the normal form over the slope-intercept form?
  • Video class: #42 Feature Detectors | Modern Computer Vision 26m
  • Video class: #43 Harris Corner Detector | Part 1 | Modern Computer Vision 23m
  • Exercise: In the Harris corner detector intuition, how does the appearance of a small patch change when the patch is centered on a corner and shifted slightly?
  • Video class: #44 Harris Corner Detector | Part 2 | Modern Computer Vision 19m
  • Video class: #45 Harris Corner Detector | Part 3 | Modern Computer Vision 21m
  • Exercise: In sub-pixel corner refinement, what condition is used at the true corner location (the maximum of the corner response) to solve for (Δx, Δy) using a Taylor expansion?
  • Video class: #46 Blob Detection | Part 1 | Modern Computer Vision 17m
  • Video class: #47 Blob Detection | Part 2 | Modern Computer Vision 26m
  • Exercise: Why is the Laplacian of Gaussian (LoG) often multiplied by c3b2 to form a scale-normalized LoG?
  • Video class: #48 Blob Detection | Part 3 | Modern Computer Vision 08m
  • Video class: #49 SIFT | Part 1 | Modern Computer Vision 22m
  • Video class: #50 SIFT | Part 2 | Modern Computer Vision 23m
  • Video class: #51 Feature Descriptors | Part 1 | Modern Computer Vision 20m
  • Exercise: How does the SIFT descriptor become a 128-dimensional vector?
  • Video class: #52 Feature Descriptors | Part 2 | Modern Computer Vision 25m
  • Video class: #53 SURF | Part 1 | Modern Computer Vision 22m
  • Exercise: In SURF-style keypoint detection, which quantity is used as the strength measure for non-maxima suppression across a 3×3×3 neighborhood?
  • Video class: #54 SURF | Part 2 | Modern Computer Vision 16m
  • Video class: #55 Single View Geometry | Part 1 | Modern Computer Vision 21m
  • Exercise: Why does panorama stitching typically assume a planar (or sufficiently far) scene?
  • Video class: #56 Single View Geometry | Part 2 | Modern Computer Vision 30m
  • Video class: #57 2D Geometric Transformations | Part 1 | Modern Computer Vision 23m
  • Exercise: Which statement best describes a general affine transformation in 2D?
  • Video class: #58 2D Geometric Transformations | Part 2 | Modern Computer Vision 29m
  • Video class: #59 Camera Intrinsics 13m
  • Exercise: Under what condition can a single homography be used to stitch images of a 3D scene into a panorama?
  • Video class: #60 Camera Intrinsics 36m
  • Video class: #61 Two View Stereo | Part 1 | Modern Computer Vision 13m
  • Exercise: In a stereo vision pipeline for estimating depth, which pair of steps is essential to recover a 3D point from its two image observations?
  • Video class: #62 Two View Stereo | Part 2 | Modern Computer Vision 20m
  • Video class: #63 Two View Stereo | Part 3 | Modern Computer Vision 12m
  • Exercise: In epipolar geometry, what does the fundamental matrix mainly provide for a point in the left image?
  • Video class: #64 Algebraic Representation of Epipolar Geometry | Part 1 | Modern Computer Vision 25m
  • Video class: #65 Algebraic Representation of Epipolar Geometry | Part 2 | Modern Computer Vision 26m
  • Exercise: In epipolar geometry, what does the fundamental matrix F do to a point \(\tilde{x}\) in the left image?
  • Video class: #66 Fundamental Matrix Computation | Part 1 | Modern Computer Vision 29m
  • Video class: #67 Fundamental Matrix Computation | Part 2 | Modern Computer Vision 18m
  • Exercise: In a rectified (parallel) stereo camera setup, how is depth z related to disparity delta ?
  • Video class: #68 Structure from Motion | Part 1 | Modern Computer Vision 09m
  • Video class: #69 Structure from Motion | Part 2 | Modern Computer Vision 30m
  • Exercise: When decomposing an essential matrix, how is the correct (R, T) solution selected from the four candidates?
  • Video class: #70 Structure from Motion | Part 3 | Modern Computer Vision 14m
  • Video class: #71 Batch Processing in SFM | Modern Computer Vision 32m
  • Exercise: In structure from motion, what is the inherent ambiguity in reconstructed 3D structure for calibrated cameras (known intrinsics)?
  • Video class: #72 Multi View SFM | Modern Computer Vision 20m
  • Video class: #73 Factorization Methods in SFM | Modern Computer Vision 15m
  • Exercise: In the factorization method for structure from motion, what is the main purpose of forming the mean-centered measurement matrix \(\tilde{W} = W\left(I - \frac{1}{Q}\mathbf{1}\mathbf{1}^T\right)\)?
  • Video class: #74 Bundle Adjustment | Modern Computer Vision 21m
  • Video class: #75 Dense 3D Reconstruction | Modern Computer Vision 13m
  • Exercise: In plane sweep stereo for dense 3D reconstruction, how is the depth for a pixel typically selected?
  • Video class: #76 Some Results in Stereo 14m
  • Video class: #77 Deepnets for Stereo 21m
  • Exercise: In unsupervised stereo depth estimation, what key idea can be used to train a network without ground-truth disparity?
  • Video class: #78 Deepnets for Stereo 16m
  • Video class: #79 Mid Level Vision | Part 1 | Modern Computer Vision 25m
  • Video class: #80 Mid Level Vision | Part 2 | Modern Computer Vision 16m
  • Video class: #81 Lucas Kanade Method for OF | Modern Computer Vision 09m
  • Exercise: In the Lucas–Kanade optical flow method, what extra constraint is added beyond brightness constancy to make the problem better-posed?
  • Video class: #82 Handling Large Motion in Optical Flow | Modern Computer Vision 07m
  • Video class: #83 Image Segmentation | Modern Computer Vision 23m
  • Exercise: In k-means clustering used for image segmentation, how is a data point assigned to a cluster?
  • Video class: #84 GMM for Clustering | Modern Computer Vision 19m
  • Video class: #85 Deepnets for Segmentation 29m
  • Exercise: In mean shift segmentation, what does the algorithm iteratively do in the feature space?
  • Video class: #86 Deepnets for Segmentation 17m
  • Video class: #87 Deepnets for Segmentation 17m
  • Exercise: In upsampling for segmentation, what is the key idea behind a transposed convolution (often used in deconvolution networks)?
  • Video class: #88 Deepnets for Object Detection | Part 1 | Modern Computer Vision 36m
  • Video class: #89 Deepnets for Object Detection | Part 2 | Modern Computer Vision 28m
  • Exercise: In Faster R-CNN, what is the primary role of the Region Proposal Network (RPN)?
  • Video class: #90 Vision 22m

This free course includes:

31 hours and 33 minutes of online video course

Digital certificate of course completion (Free)

Exercises to train your knowledge

100% free, from content to certificate

Ready to get started?Download the app and get started today.

Install the app now

to access the course
Icon representing technology and business courses

Over 5,000 free courses

Programming, English, Digital Marketing and much more! Learn whatever you want, for free.

Calendar icon with target representing study planning

Study plan with AI

Our app's Artificial Intelligence can create a study schedule for the course you choose.

Professional icon representing career and business

From zero to professional success

Improve your resume with our free Certificate and then use our Artificial Intelligence to find your dream job.

You can also use the QR Code or the links below.

QR Code - Download Cursa - Online Courses

More free courses at Artificial Intelligence and Machine Learning

Free Ebook + Audiobooks! Learn by listening or reading!

Download the App now to have access to + 5000 free courses, exercises, certificates and lots of content without paying anything!

  • 100% free online courses from start to finish

    Thousands of online courses in video, ebooks and audiobooks.

  • More than 60 thousand free exercises

    To test your knowledge during online courses

  • Valid free Digital Certificate with QR Code

    Generated directly from your cell phone's photo gallery and sent to your email

Cursa app on the ebook screen, the video course screen and the course exercises screen, plus the course completion certificate