All courses > Technology and Programming > Artificial Intelligence and Machine Learning ::

The Perceptron and Linear Models: What a Single Layer Can Represent

Capítulo 2

Estimated reading time: 5 minutes

The perceptron as “weighted sum + bias”

A perceptron is the simplest kind of neuron: it takes an input vector, computes a single numeric score, then turns that score into a class decision. Even though it is simple, it already captures an important idea: a model can separate classes by drawing a linear boundary in the input space.

(1) Compute a score with a dot product

Assume an input vector x with d features: x = [x1, x2, …, xd]. The perceptron has a weight vector w = [w1, w2, …, wd] and a bias b (a single scalar). It computes a score (sometimes called logit):

score = w · x + b = (w1*x1 + w2*x2 + ... + wd*xd) + b

Interpretation: each feature contributes proportionally to its weight. Positive weights push the score up when the feature is large; negative weights push it down. The bias shifts the score up or down regardless of the input, which lets the boundary move away from the origin.

Practical step-by-step example (2 features). Suppose x = [2, 1], w = [0.7, -1.2], b = 0.3.

Compute dot product: w · x = 0.7*2 + (-1.2)*1 = 1.4 - 1.2 = 0.2
Add bias: score = 0.2 + 0.3 = 0.5

The perceptron has not made a class decision yet; it has only produced a score.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

(2) Apply a threshold / decision rule

For binary classification, the classic perceptron uses a hard threshold (a step function):

predict y = 1 if score ≥ 0 else 0

This means the model predicts class 1 on one side of the boundary (where the score is nonnegative) and class 0 on the other side (where the score is negative).

Continuing the example. We computed score = 0.5, which is ≥ 0, so the perceptron predicts y = 1.

Many modern linear classifiers replace the hard threshold with a smooth function (for example, a sigmoid) to get probabilities and enable gradient-based training. But the geometric boundary is still determined by the same linear expression w · x + b = 0.

(3) Visualize in 2D: a line separating classes

When there are two input features (x1 and x2), the decision boundary is the set of points where the score equals zero:

w1*x1 + w2*x2 + b = 0

This is the equation of a line in the (x1, x2) plane. One side of the line yields positive scores (predict class 1), and the other side yields negative scores (predict class 0).

Turn the boundary into a familiar slope-intercept form. If w2 ≠ 0, you can solve for x2:

x2 = -(w1/w2)*x1 - (b/w2)

So:

The slope is -(w1/w2)
The intercept is -(b/w2)

Concrete boundary example. Let w = [1, -1] and b = -0.2. The boundary is:

1*x1 + (-1)*x2 - 0.2 = 0  →  x2 = x1 - 0.2

Points above this line (where x2 is larger than x1 - 0.2) produce a negative score (class 0), and points below produce a positive score (class 1), because the sign depends on w · x + b.

What a single layer can represent

A single perceptron (or a single linear layer followed by a threshold) can represent any decision rule that separates the two classes with a single hyperplane:

In 2D, the boundary is a line.
In 3D, the boundary is a plane.
In d dimensions, the boundary is a (d−1)-dimensional hyperplane.

This is why perceptrons and linear models are often described as “linear classifiers”: their decision boundary is linear in the input features.

Practical checklist: is your problem a good fit for a perceptron?

Plot two features (or use dimensionality reduction for a rough view) and see whether a single line could separate the classes reasonably well.
Check whether adding or transforming features makes the separation more linear (for example, include x1*x2 or x1^2 as additional features). A linear model in an expanded feature space can still be powerful.
If the boundary clearly needs to bend around regions, a single linear boundary will struggle.

Limitations: non-linearly separable patterns

The key limitation is structural: because the decision boundary is a single hyperplane, the perceptron cannot represent patterns that require multiple regions or curved boundaries in the original input space.

Example: XOR cannot be solved by one perceptron

Consider the XOR pattern with two binary inputs:

(0, 0) → 0
(1, 0) → 1
(0, 1) → 1
(1, 1) → 0

If you plot these four points in 2D, the positive class points are on opposite corners, and the negative class points are on the remaining corners. No single straight line can separate the positives from the negatives. Any line that captures one positive corner will misclassify the other.

Why adding nonlinearity and depth becomes necessary

To represent non-linearly separable patterns, you need the model to create boundaries that are not just one hyperplane. There are two common ways this happens in neural networks:

Nonlinear activation functions allow stacking layers so that the overall input-output mapping is not linear, even if each layer computes a weighted sum.
Multiple layers (depth) let the network build intermediate features that “re-express” the data so that a later linear separation becomes possible.

In other words, a single layer can only draw one straight cut through the space; nonlinearity and depth let the model combine multiple cuts into more complex shapes, enabling it to separate patterns like XOR and many real-world datasets where classes are intertwined.

Now answer the exercise about the content:

Why can a single perceptron fail to classify patterns like XOR correctly?

You are right! Congratulations, now go to the next page

You missed! Try again.

A single perceptron separates classes using one hyperplane defined by w·x + b = 0. XOR is not linearly separable in the original input space, so no single straight boundary can split its positive and negative corners correctly.