All courses > Technology and Programming > Artificial Intelligence and Machine Learning ::

Designing Models with Keras: Layers, Shapes, and Functional APIs

Capítulo 4

Estimated reading time: 8 minutes

From Problem Statement to Input/Output Shapes

Before choosing layers, translate your problem into the shapes your model must accept and produce. Keras layers operate on tensors with a leading batch dimension, so you typically describe shapes without the batch size.

Tabular features: a single vector per example, shape (num_features,).
Images: height, width, channels, shape (H, W, C).
Text sequences (already tokenized): shape (T,) or (T, D) if embedded.

Then decide the output shape based on the task:

Binary classification: output units 1, output shape (1,).
Multiclass (single-label) classification: output units num_classes, output shape (num_classes,).
Regression: output units 1 (or k for multi-target), output shape (1,) or (k,).

These shape decisions drive the final layer configuration and the choice of activation and loss.

Start Simple: Sequential Dense Networks

The Sequential API is ideal when your model is a straight stack of layers (no branching, no multiple inputs). A typical workflow is: define input shape, add hidden layers, add an output layer that matches the task.

Example: Dense Network for Tabular Binary Classification

Suppose you have 20 numeric features and want to predict whether a customer will churn (yes/no). Your input per example is shape (20,), and your output is shape (1,).

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

num_features = 20

model = keras.Sequential([
    layers.Input(shape=(num_features,)),
    layers.Dense(64, activation="relu"),
    layers.Dense(32, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

model.summary()

Why these choices?

Dense(1, activation="sigmoid") produces a probability-like value in [0, 1] for the positive class.
binary_crossentropy matches a single sigmoid output.

Reading model.summary(): Shapes and Parameter Counts

model.summary() prints a table with three key columns:

Output Shape: how each layer transforms the tensor shape (excluding batch size).
Param #: number of trainable parameters in that layer.
Connected to (for Functional models): which tensors feed into the layer.

For a Dense layer, parameter count is:

(input_units * output_units) + output_units

The extra + output_units accounts for the bias vector. For example, if the first Dense layer has 64 units and receives 20 inputs, parameters are (20 * 64) + 64 = 1344.

Choosing the Final Layer: Binary, Multiclass, Regression

The last layer must match the meaning and shape of your target y.

Binary Classification

Output units: 1
Activation: sigmoid
Loss: binary_crossentropy

layers.Dense(1, activation="sigmoid")

Multiclass Classification (Single Label)

Assume 5 classes. You have two common label formats:

Integer labels (e.g., 0..4): use sparse_categorical_crossentropy.
One-hot labels (e.g., length-5 vectors): use categorical_crossentropy.

num_classes = 5

model = keras.Sequential([
    layers.Input(shape=(num_features,)),
    layers.Dense(64, activation="relu"),
    layers.Dense(num_classes, activation="softmax")
])

model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",  # or categorical_crossentropy
    metrics=["accuracy"]
)

Why softmax? It produces a probability distribution across classes (sums to 1), matching a single-label multiclass target.

Regression

For a single numeric target (e.g., house price), use a linear output (no activation) so the model can predict any real value.

model = keras.Sequential([
    layers.Input(shape=(num_features,)),
    layers.Dense(64, activation="relu"),
    layers.Dense(1)  # linear output
])

model.compile(
    optimizer="adam",
    loss="mse",
    metrics=["mae"]
)

If you have multiple regression targets (e.g., predict (x, y, z)), set output units to the number of targets: layers.Dense(3).

When Sequential Is Not Enough: The Functional API

Use the Functional API when your architecture is not a simple stack: multiple inputs, multiple outputs, shared layers, or branching/merging paths. The key idea is that you explicitly connect layers by calling them on tensors.

Step-by-Step Pattern

Create one or more keras.Input tensors with explicit shapes.
Apply layers to build computation paths.
Combine paths with merge layers (e.g., Concatenate).
Create a keras.Model(inputs=..., outputs=...).

Example: Multi-Input Model (Numeric + Categorical ID)

Imagine a prediction task where each example has:

10 numeric features (shape (10,))
a product ID (an integer) that you want to embed (shape (1,))

You can process each input differently, then merge.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

num_numeric = 10
num_products = 5000
embed_dim = 16

numeric_in = keras.Input(shape=(num_numeric,), name="numeric")
product_in = keras.Input(shape=(1,), dtype="int32", name="product_id")

# Branch 1: numeric features
x_num = layers.Dense(32, activation="relu")(numeric_in)

# Branch 2: product id embedding
x_prod = layers.Embedding(input_dim=num_products, output_dim=embed_dim)(product_in)
# Embedding output shape: (batch, 1, embed_dim). Flatten to (batch, embed_dim)
x_prod = layers.Flatten()(x_prod)

# Merge
x = layers.Concatenate()([x_num, x_prod])
x = layers.Dense(64, activation="relu")(x)
output = layers.Dense(1, activation="sigmoid", name="churn_prob")(x)

model = keras.Model(inputs=[numeric_in, product_in], outputs=output)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

Shape reasoning:

numeric_in is (10,) per example.
product_in is (1,) per example; embedding yields (1, 16), then Flatten makes it (16,).
Concatenate joins feature dimensions: (32,) + (16,) => (48,) (per example).
Final output is (1,) for binary classification.

Interpreting summary() for Functional Models

In Functional models, model.summary() helps you verify that:

Each input has the expected shape and dtype.
Branch outputs are compatible for merging (e.g., concatenation requires matching batch size and compatible ranks).
Parameter counts align with your expectations (especially embeddings and large dense layers).

For example, an embedding layer has parameters input_dim * output_dim. With 5000 products and 16 dimensions, that is 80,000 parameters.

Example: Branching for a Shared Representation (Two Outputs)

Sometimes you want one shared “trunk” and multiple “heads”, such as predicting both a probability (classification) and a numeric value (regression) from the same features.

num_features = 20

inputs = keras.Input(shape=(num_features,), name="features")

trunk = layers.Dense(64, activation="relu")(inputs)
trunk = layers.Dense(32, activation="relu")(trunk)

# Head 1: binary classification
class_out = layers.Dense(1, activation="sigmoid", name="will_buy")(trunk)

# Head 2: regression
value_out = layers.Dense(1, name="expected_spend")(trunk)

model = keras.Model(inputs=inputs, outputs=[class_out, value_out])

model.compile(
    optimizer="adam",
    loss={"will_buy": "binary_crossentropy", "expected_spend": "mse"},
    metrics={"will_buy": ["accuracy"], "expected_spend": ["mae"]}
)

model.summary()

Shape reasoning: both heads output shape (1,), but they represent different targets and use different losses.

Common Shape Pitfalls and Quick Fixes

Forgetting the feature dimension: for tabular data, use Input(shape=(num_features,)), not Input(shape=num_features).
Rank mismatch when concatenating: if one branch outputs (batch, 1, D) and another outputs (batch, D2), use Flatten or Reshape to align ranks.
Wrong output units: multiclass single-label needs num_classes units; binary needs 1 unit.
Wrong activation/loss pairing: sigmoid pairs with binary_crossentropy; softmax pairs with categorical losses; linear output pairs with regression losses like mse.

Keeping Models Self-Contained with Preprocessing Layers

To make a model easier to serve and less error-prone, include preprocessing inside the model graph. This ensures the same transformations are applied during training and inference.

Normalization for Numeric Features

Normalization learns mean and variance from training data via adapt(), then standardizes inputs.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

num_features = 20
normalizer = layers.Normalization(axis=-1)

# Assume you have a NumPy array or tensor of training features: x_train
# normalizer.adapt(x_train)

model = keras.Sequential([
    layers.Input(shape=(num_features,)),
    normalizer,
    layers.Dense(64, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

Practical tip: call adapt() only on training data (not validation/test) to avoid leaking information.

Rescaling for Image Inputs

Raw images are often uint8 in [0, 255]. Rescaling converts to a float range like [0, 1].

H, W, C = 224, 224, 3

model = keras.Sequential([
    layers.Input(shape=(H, W, C)),
    layers.Rescaling(1./255),
    layers.Conv2D(16, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, activation="relu"),
    layers.GlobalAveragePooling2D(),
    layers.Dense(1, activation="sigmoid")
])

Even if you later change how you load images, the model still expects raw pixel values and handles scaling internally, which simplifies deployment.

Preprocessing with Multiple Inputs (Functional API)

When inputs differ by type, attach preprocessing to each input branch.

num_numeric = 10

numeric_in = keras.Input(shape=(num_numeric,), name="numeric")
product_in = keras.Input(shape=(1,), dtype="int32", name="product_id")

numeric_norm = layers.Normalization(axis=-1)
# numeric_norm.adapt(x_numeric_train)

x_num = numeric_norm(numeric_in)
x_num = layers.Dense(32, activation="relu")(x_num)

x_prod = layers.Embedding(5000, 16)(product_in)
x_prod = layers.Flatten()(x_prod)

x = layers.Concatenate()([x_num, x_prod])
x = layers.Dense(64, activation="relu")(x)
out = layers.Dense(1, activation="sigmoid")(x)

model = keras.Model([numeric_in, product_in], out)

This pattern keeps all transformations close to the data they apply to, and makes the exported model more robust because it includes both preprocessing and prediction.

Now answer the exercise about the content:

In a Functional API model that merges numeric features with an embedded product ID, why is a Flatten (or Reshape) typically applied after the Embedding layer before Concatenate?

You are right! Congratulations, now go to the next page

You missed! Try again.

Embedding on a (1,) input yields a 3D tensor (batch, 1, D). Flatten (or Reshape) converts it to (batch, D), aligning ranks so Concatenate can merge it with a 2D branch like (batch, D2).