All courses > Technology and Programming > Artificial Intelligence and Machine Learning ::

When Deep Learning Is Appropriate (and When It Isnd45ad45f)

Capítulo 12

Estimated reading time: 9 minutes

Start With the Decision, Not the Model

Choosing deep learning is less about “is it powerful?” and more about “is it the right tool under my constraints?” A practical way to decide is to evaluate four axes: (1) problem type and data modality, (2) labeled data availability, (3) tolerance for opacity and risk, and (4) deployment constraints. You can often reach a better overall outcome (time-to-value, reliability, maintainability) with simpler methods.

Axis 1: Problem Type and Data Modality

Deep learning is often appropriate when the input is unstructured

Deep learning tends to shine when the raw input is high-dimensional and unstructured, and hand-crafting features is hard or brittle.

Images/video: object detection, defect detection, medical imaging triage, visual search.
Audio: speech recognition, acoustic event detection, call-center analytics.
Text: classification, extraction, semantic search, summarization (with careful evaluation).
Continue in our app.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Or continue reading below...
Download the app
Complex signals: sensor streams, multivariate time series with nontrivial patterns.

Decision point: If your current approach requires extensive manual feature engineering that keeps breaking when conditions change (lighting, accents, phrasing, device drift), deep learning may reduce feature-maintenance burden.

Simpler methods are often appropriate for structured/tabular data

For many business problems with tabular data (rows = entities, columns = measured variables), tree-based methods and linear models are strong baselines and often remain best-in-class given typical dataset sizes and noise.

Examples: churn prediction, credit risk scoring, demand forecasting with engineered features, fraud detection with aggregated behavior features.
Why: tabular problems often have mixed feature types, missingness patterns, and strong signal in a few variables; tree ensembles handle this well with minimal preprocessing.

Decision point: If your data is mostly numeric/categorical columns and you can define meaningful aggregates (counts, rates, recency), start with tree-based models or linear models before deep learning.

Complex interactions: deep learning vs trees

“Complex interactions” alone does not automatically imply deep learning. Gradient-boosted trees can capture many nonlinear interactions in tabular data. Deep learning becomes more compelling when interactions are both complex and tied to unstructured inputs, or when you need representation learning across many sparse categories (e.g., large-scale recommendation) under ample data.

Decision point: If the complexity is primarily within tabular features, try boosted trees first; if complexity comes from raw unstructured inputs or extremely high-cardinality sparse inputs at scale, consider deep learning.

Axis 2: Availability (and Quality) of Labeled Data

Deep learning is data-hungry in practice

Even when you use pretrained models, you still need enough task-specific labeled data to (a) adapt the model to your domain and (b) validate performance across edge cases. The key is not just quantity, but coverage: do labels represent the real-world variety you will see?

Step-by-step: assess whether your labels are “enough”

Step 1: Define the target and evaluation slices. List critical subgroups and conditions (device types, regions, product categories, lighting conditions, languages, rare but costly cases).
Step 2: Estimate label coverage per slice. You want enough examples in each important slice to measure performance with acceptable uncertainty.
Step 3: Check label reliability. If humans disagree often, the model will learn inconsistent targets. Measure inter-annotator agreement or run spot audits.
Step 4: Compare against a simple baseline. Train a linear model or tree-based model on available features. If it already meets requirements, deep learning may be unnecessary.
Step 5: Plan the labeling loop. If you choose deep learning, budget for iterative labeling (collect failures, label them, retrain). Without this loop, performance often stalls.

When labeled data is limited: prefer simpler models or feature-engineered approaches

If you have only a small labeled dataset and the input is structured, linear models and tree-based methods often generalize better and are easier to debug. If the input is unstructured, consider leveraging pretrained embeddings and training a simpler classifier on top, rather than training a large model end-to-end.

Decision point: If you cannot realistically obtain more labels (cost, privacy, time), avoid solutions that require ongoing labeling to remain stable.

Axis 3: Tolerance for Opacity, Risk, and Accountability

Interpretability requirements can rule out deep learning

Some domains require explanations that are understandable to humans and defensible in audits (finance, healthcare, regulated decisioning). Deep learning can provide post-hoc explanations, but these may not satisfy regulatory or internal governance needs.

High interpretability needed: prefer linear models with clear coefficients, monotonic constraints, or transparent scorecards; or tree models with constrained depth and documented rules.
Moderate interpretability needed: gradient-boosted trees with feature importance and partial dependence can be acceptable, combined with strong validation and documentation.
Low interpretability needed: deep learning is more feasible, especially when the task is perceptual (vision/audio) and the output is advisory rather than fully automated.

Step-by-step: decide if opacity is acceptable

Step 1: Identify who must trust the model. End users, auditors, internal risk teams, customers?
Step 2: Define the required form of explanation. Global (how the model works overall) vs local (why this decision). Also define whether explanations must be causal or just descriptive.
Step 3: Determine the cost of a wrong prediction. If errors are high-stakes, you may need simpler models, conservative thresholds, or human-in-the-loop review regardless of model type.
Step 4: Decide on governance artifacts. Documentation, monitoring plans, bias checks, and escalation procedures. If you can’t support these, avoid complex models.

Axis 4: Deployment Constraints (Latency, Hardware, Reliability, Updates)

Deep learning can be expensive to run and maintain

Deep learning models may require specialized hardware, careful optimization, and more complex serving infrastructure. Even if training is done once, deployment often includes model packaging, versioning, monitoring for drift, and periodic retraining.

Latency constraints: real-time systems (e.g., ad auctions, fraud checks at checkout) may not tolerate large models unless optimized.
Edge deployment: mobile/IoT may require small models, quantization, or entirely different approaches.
Reliability constraints: if the system must behave predictably under unusual inputs, simpler models may be safer and easier to bound.
Data privacy constraints: if data cannot leave a device or region, training and monitoring pipelines may be constrained.

Step-by-step: deployment feasibility checklist

Step 1: Set a latency and throughput budget. Define p95/p99 latency targets and expected QPS.
Step 2: Identify available compute. CPU-only, GPU, accelerator? In cloud, on-prem, or edge?
Step 3: Estimate serving cost. Cost per 1,000 predictions and expected monthly spend; compare to business value.
Step 4: Define update cadence. How often will the model need retraining due to drift or new classes?
Step 5: Plan monitoring. Track input distribution shifts, performance on key slices, and operational metrics (latency, errors). If you can’t monitor, prefer simpler models.

Alternatives to Deep Learning (and When to Use Them)

Linear models

Use when: you need strong interpretability, stable behavior, fast training/serving, and your signal is mostly additive or can be made so with reasonable feature transformations.

Good for: scoring, ranking with engineered features, baseline classification/regression, calibrated probabilities.
Decision point: If stakeholders need a clear rationale per feature and you can’t afford complex infrastructure, start here.

Tree-based methods (decision trees, random forests, gradient boosting)

Use when: your data is tabular, nonlinear interactions matter, missing values and mixed feature types are common, and you want strong performance without heavy feature learning.

Good for: many production tabular ML tasks, especially with limited-to-moderate data sizes.
Decision point: If you can reach target metrics with boosted trees, deep learning may add complexity without clear benefit.

Feature-engineered approaches

Use when: domain knowledge can produce robust signals, labels are limited, or you need predictable behavior.

Examples: rule-based filters + linear model, time-series features (rolling stats, seasonality indicators), text TF-IDF + linear classifier, image descriptors in constrained settings.
Decision point: If you can define features that directly capture what matters (e.g., “number of failed logins in last hour”), you may not need deep learning.

A Practical Decision Flow You Can Apply

Step-by-step selection process

Step 1: Classify the input type. If unstructured (image/audio/text), deep learning is a strong candidate; if tabular, start with trees/linear.
Step 2: Establish a baseline. Build the simplest model that could plausibly work (linear or boosted trees; or pretrained embeddings + linear head for text). Record metrics and failure cases.
Step 3: Check data readiness. Do you have enough labeled data and coverage of edge cases? If not, prioritize data collection and labeling strategy before scaling model complexity.
Step 4: Check governance and interpretability needs. If you must explain decisions in a strict way, constrain model choice accordingly.
Step 5: Check deployment feasibility. If latency/cost/compute constraints are tight, prefer smaller models or simpler methods.
Step 6: Only then consider deep learning upgrades. Move to deep learning when the baseline is clearly limited by representation (features can’t capture the signal) and you can support the data and deployment lifecycle.

How to Articulate Trade-offs to Stakeholders

Accuracy vs interpretability

Frame the choice as a spectrum: “We can likely gain X points of accuracy with a more complex model, but explanations will be less direct.” Specify what interpretability means in your context: auditability, user-facing explanations, or debugging ability.

Useful phrasing: “If we need decisions that can be justified feature-by-feature, we should prefer a simpler model. If the task is perceptual and the output is advisory, we can accept a more opaque model with stronger monitoring.”

Training cost vs maintenance cost

Stakeholders often focus on initial build cost, but maintenance dominates long-lived systems. Deep learning may require ongoing labeling, retraining, and infrastructure tuning.

Useful phrasing: “The training run is not the main cost; the main cost is keeping performance stable as data changes. If we choose deep learning, we need a budget and process for monitoring and periodic updates.”

Robustness vs complexity

More complex models can be more capable, but also introduce more failure modes and harder-to-predict behavior under distribution shift. Robustness is not guaranteed by complexity; it is achieved through validation on realistic slices, conservative deployment, and monitoring.

Useful phrasing: “We can choose a simpler model that is easier to bound and test, or a more complex model that may perform better on average but needs stronger guardrails and monitoring to manage rare failures.”

A stakeholder-ready comparison table (template)

Option A: Linear / Simple model
- Expected accuracy: medium
- Interpretability: high
- Data needs: low to medium
- Serving cost/latency: low
- Maintenance: low
- Best when: regulated decisions, limited labels, tight latency

Option B: Tree-based model
- Expected accuracy: medium to high (tabular)
- Interpretability: medium
- Data needs: medium
- Serving cost/latency: low to medium
- Maintenance: medium
- Best when: tabular data, nonlinear interactions

Option C: Deep learning
- Expected accuracy: high (unstructured or large-scale)
- Interpretability: low to medium (with tooling)
- Data needs: medium to high (plus coverage)
- Serving cost/latency: medium to high
- Maintenance: medium to high (monitoring, retraining)
- Best when: unstructured inputs, representation learning needed, scale supports it

Now answer the exercise about the content:

You are choosing a modeling approach for a prediction task where the data is mainly structured tabular columns (numeric/categorical) and you can engineer meaningful aggregates like counts and recency. According to the decision flow, what should you try first?

You are right! Congratulations, now go to the next page

You missed! Try again.

For structured/tabular data, the recommended starting point is tree-based methods or linear models. They are strong baselines and often best-in-class given typical tabular dataset sizes and noise, especially when meaningful aggregates can be defined.