The perceptron as “weighted sum + bias”
A perceptron is the simplest kind of neuron: it takes an input vector, computes a single numeric score, then turns that score into a class decision. Even though it is simple, it already captures an important idea: a model can separate classes by drawing a linear boundary in the input space.
(1) Compute a score with a dot product
Assume an input vector x with d features: x = [x1, x2, …, xd]. The perceptron has a weight vector w = [w1, w2, …, wd] and a bias b (a single scalar). It computes a score (sometimes called logit):
score = w · x + b = (w1*x1 + w2*x2 + ... + wd*xd) + bInterpretation: each feature contributes proportionally to its weight. Positive weights push the score up when the feature is large; negative weights push it down. The bias shifts the score up or down regardless of the input, which lets the boundary move away from the origin.
Practical step-by-step example (2 features). Suppose x = [2, 1], w = [0.7, -1.2], b = 0.3.
- Compute dot product: w · x = 0.7*2 + (-1.2)*1 = 1.4 - 1.2 = 0.2
- Add bias: score = 0.2 + 0.3 = 0.5
The perceptron has not made a class decision yet; it has only produced a score.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
(2) Apply a threshold / decision rule
For binary classification, the classic perceptron uses a hard threshold (a step function):
predict y = 1 if score ≥ 0 else 0This means the model predicts class 1 on one side of the boundary (where the score is nonnegative) and class 0 on the other side (where the score is negative).
Continuing the example. We computed score = 0.5, which is ≥ 0, so the perceptron predicts y = 1.
Many modern linear classifiers replace the hard threshold with a smooth function (for example, a sigmoid) to get probabilities and enable gradient-based training. But the geometric boundary is still determined by the same linear expression w · x + b = 0.
(3) Visualize in 2D: a line separating classes
When there are two input features (x1 and x2), the decision boundary is the set of points where the score equals zero:
w1*x1 + w2*x2 + b = 0This is the equation of a line in the (x1, x2) plane. One side of the line yields positive scores (predict class 1), and the other side yields negative scores (predict class 0).
Turn the boundary into a familiar slope-intercept form. If w2 ≠ 0, you can solve for x2:
x2 = -(w1/w2)*x1 - (b/w2)So:
- The slope is -(w1/w2)
- The intercept is -(b/w2)
Concrete boundary example. Let w = [1, -1] and b = -0.2. The boundary is:
1*x1 + (-1)*x2 - 0.2 = 0 → x2 = x1 - 0.2Points above this line (where x2 is larger than x1 - 0.2) produce a negative score (class 0), and points below produce a positive score (class 1), because the sign depends on w · x + b.
What a single layer can represent
A single perceptron (or a single linear layer followed by a threshold) can represent any decision rule that separates the two classes with a single hyperplane:
- In 2D, the boundary is a line.
- In 3D, the boundary is a plane.
- In d dimensions, the boundary is a (d−1)-dimensional hyperplane.
This is why perceptrons and linear models are often described as “linear classifiers”: their decision boundary is linear in the input features.
Practical checklist: is your problem a good fit for a perceptron?
- Plot two features (or use dimensionality reduction for a rough view) and see whether a single line could separate the classes reasonably well.
- Check whether adding or transforming features makes the separation more linear (for example, include x1*x2 or x1^2 as additional features). A linear model in an expanded feature space can still be powerful.
- If the boundary clearly needs to bend around regions, a single linear boundary will struggle.
Limitations: non-linearly separable patterns
The key limitation is structural: because the decision boundary is a single hyperplane, the perceptron cannot represent patterns that require multiple regions or curved boundaries in the original input space.
Example: XOR cannot be solved by one perceptron
Consider the XOR pattern with two binary inputs:
- (0, 0) → 0
- (1, 0) → 1
- (0, 1) → 1
- (1, 1) → 0
If you plot these four points in 2D, the positive class points are on opposite corners, and the negative class points are on the remaining corners. No single straight line can separate the positives from the negatives. Any line that captures one positive corner will misclassify the other.
Why adding nonlinearity and depth becomes necessary
To represent non-linearly separable patterns, you need the model to create boundaries that are not just one hyperplane. There are two common ways this happens in neural networks:
- Nonlinear activation functions allow stacking layers so that the overall input-output mapping is not linear, even if each layer computes a weighted sum.
- Multiple layers (depth) let the network build intermediate features that “re-express” the data so that a later linear separation becomes possible.
In other words, a single layer can only draw one straight cut through the space; nonlinearity and depth let the model combine multiple cuts into more complex shapes, enabling it to separate patterns like XOR and many real-world datasets where classes are intertwined.