All courses > Technology and Programming > Backend development ::

Performance Basics with the Django ORM and Common Bottlenecks

Capítulo 11

Estimated reading time: 10 minutes

What “Performance” Means in a Django ORM Backend

Most backend slowdowns in Django apps come from database work: too many queries, queries that fetch too much data, missing indexes, or repeated work that could be cached. The Django ORM makes it easy to write correct code quickly, but it can also hide expensive query patterns unless you inspect what’s actually happening.

In this chapter you’ll learn to recognize common bottlenecks, fix them with ORM tools like select_related() and prefetch_related(), add indexes for common filters, and use a repeatable workflow to measure improvements and prevent regressions.

Core Bottleneck #1: N+1 Queries

What N+1 Looks Like

N+1 happens when you run 1 query to fetch a list of objects, then run an additional query per object to fetch related data. It often appears in list endpoints, admin-like pages, or API responses that include related fields.

Example scenario: you list Order objects and show each order’s customer.email. If Order.customer is a foreign key, accessing order.customer without eager loading can trigger one query per order.

# Bad: likely N+1 if you later access order.customer in a loop/serializer/template
orders = Order.objects.filter(status="paid")
for order in orders:
    print(order.customer.email)

Fix N+1 with select_related (ForeignKey / OneToOne)

Use select_related() when the relationship is single-valued (ForeignKey, OneToOne). It performs a SQL JOIN and fetches related rows in the same query.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

# Good: 1 query total (typically)
orders = (
    Order.objects
    .filter(status="paid")
    .select_related("customer")
)
for order in orders:
    print(order.customer.email)

Step-by-step workflow:

Identify where you iterate over a queryset and access a related field.
Determine relationship type: ForeignKey/OneToOne → select_related.
Add .select_related(...) for the needed relation(s).
Re-check query count (see “Query Inspection” below).

Fix N+1 with prefetch_related (ManyToMany / reverse FK)

Use prefetch_related() when the relationship is multi-valued (ManyToMany, reverse ForeignKey). It runs a second query to fetch all related rows in bulk, then matches them in Python.

# Example: BlogPost has many Comment (reverse FK)
posts = BlogPost.objects.filter(published=True)
for post in posts:
    # Without prefetch_related, this can be N+1
    print(post.comment_set.count())

# Good: 2 queries total (posts + all comments)
posts = (
    BlogPost.objects
    .filter(published=True)
    .prefetch_related("comment_set")
)
for post in posts:
    print(len(post.comment_set.all()))

Note: calling .count() on a related manager can still hit the DB depending on context. If you need counts for many objects, consider annotating counts (see below) or using prefetched data carefully.

Advanced Prefetch: Prefetch Objects for Filtering/Ordering Related Data

Sometimes you want only a subset of related objects (e.g., only approved comments). Use Prefetch to control the related queryset and optionally store it on a custom attribute.

from django.db.models import Prefetch

posts = BlogPost.objects.prefetch_related(
    Prefetch(
        "comment_set",
        queryset=Comment.objects.filter(approved=True).order_by("-created_at"),
        to_attr="approved_comments",
    )
)

for post in posts:
    # Uses in-memory list; no extra queries
    print(len(post.approved_comments))

Core Bottleneck #2: Unnecessary Database Hits

Repeated Evaluation of the Same Queryset

Django querysets are lazy. Each time you evaluate them (iterate, list(), len(), bool(), slicing in some cases), you may trigger a query. A common issue is evaluating the same queryset multiple times in one request.

# Potentially runs the same query twice
qs = Order.objects.filter(status="paid")
if qs:
    for order in qs:
        ...

Better patterns:

# Option A: evaluate once
orders = list(Order.objects.filter(status="paid"))
if orders:
    for order in orders:
        ...

# Option B: use .exists() for the check, then fetch only if needed
qs = Order.objects.filter(status="paid")
if qs.exists():
    for order in qs:
        ...

Fetching Too Many Columns

If you only need a few fields, avoid loading entire model rows (especially if models have large text fields or many columns).

only() loads a subset of fields but can cause extra queries if you later access deferred fields.
values()/values_list() returns dictionaries/tuples and can be efficient for read-only endpoints.

# Efficient for read-only list responses
rows = (
    Customer.objects
    .filter(active=True)
    .values_list("id", "email")
)

Using count() vs exists() vs len()

.exists() is usually the cheapest way to check if any row matches.
.count() runs COUNT(*); good when you truly need the count.
len(qs) evaluates the queryset and loads rows; expensive for large sets.

# Good
if Order.objects.filter(status="paid").exists():
    ...

# Good when you need the number
paid_count = Order.objects.filter(status="paid").count()

Core Bottleneck #3: Inefficient Filtering and Missing Indexes

Recognize “Slow Filters”

Filters become slow when they scan many rows. Common causes:

Filtering on non-indexed columns that are used frequently.
Filtering on computed expressions (e.g., applying functions to columns) that prevent index usage.
Using icontains on large text fields without specialized indexing.
Sorting (order_by) large result sets without indexes that support the ordering.

Add Indexes for Common Query Patterns

Indexes are a database feature, but Django lets you define them in models. Add indexes for fields frequently used in filter(), exclude(), order_by(), and joins.

Examples:

from django.db import models

class Order(models.Model):
    status = models.CharField(max_length=20, db_index=True)
    created_at = models.DateTimeField(db_index=True)
    customer = models.ForeignKey("Customer", on_delete=models.CASCADE)

    class Meta:
        indexes = [
            models.Index(fields=["status", "created_at"]),
        ]

Step-by-step workflow for indexes:

List your slow endpoints and the exact ORM filters/sorts they use.
Identify the most common WHERE clauses and ORDER BY fields.
Add db_index=True for single-field patterns; use Meta.indexes for multi-column patterns.
Create and apply migrations.
Verify improvement by timing and by checking the query plan (see below).

Use explain() to Inspect Query Plans

Django supports QuerySet.explain() to ask the database how it will execute a query. This is one of the fastest ways to confirm whether an index is being used.

qs = Order.objects.filter(status="paid").order_by("-created_at")
print(qs.explain())

Look for signs like “Index Scan” (good) versus “Seq Scan” / “Full Table Scan” (often a sign you need an index or a different query shape).

Query Inspection: Seeing What the ORM Actually Does

Print the SQL for a Queryset

When debugging, you can inspect the SQL generated by the ORM:

qs = Order.objects.filter(status="paid").select_related("customer")
print(str(qs.query))

This helps you verify JOINs, WHERE clauses, and ordering.

Django Debug Toolbar (Development)

In development, Django Debug Toolbar is a common tool to inspect:

Number of SQL queries per request
Time spent per query
Duplicate queries
Stack traces showing where queries originate

Practical usage checklist:

Load the slow page/endpoint in development.
Open the SQL panel and sort by time.
Look for repeated similar queries (classic N+1).
Apply select_related/prefetch_related and re-check.
Confirm query count and total SQL time decreased.

Capture Queries in Tests (Regression Control)

To prevent performance regressions, add tests that assert query counts for critical code paths.

from django.test import TestCase

class OrderListPerfTests(TestCase):
    def test_order_list_query_count(self):
        with self.assertNumQueries(2):
            # Example: 1 query for orders + 1 for prefetched items
            response = self.client.get("/orders/")
            self.assertEqual(response.status_code, 200)

Use this sparingly and only for endpoints where query count is stable and meaningful. If query counts vary across databases or feature flags, assert an upper bound by measuring and setting a reasonable threshold.

ORM Patterns That Commonly Cause Hidden Slowness

Annotate for Aggregates Instead of Per-Row Queries

If you need counts/sums per object, avoid calling related queries inside loops. Use annotate() to compute aggregates in SQL.

from django.db.models import Count

posts = (
    BlogPost.objects
    .filter(published=True)
    .annotate(comment_count=Count("comment"))
)

for post in posts:
    print(post.comment_count)

Be Careful with distinct()

distinct() can be expensive, especially combined with joins and ordering. If you’re using it to “fix duplicates” introduced by joins, consider whether you should restructure the query (e.g., prefetch instead of join, or query the primary table first).

Large IN Clauses and Bulk Operations

Filtering with large lists (id__in=[...]) can become slow. If you’re processing many rows, prefer bulk operations:

bulk_create for inserts
bulk_update for updates
update() for set-based updates

# Set-based update (single query)
Order.objects.filter(status="pending", created_at__lt=cutoff).update(status="expired")

Caching Basics: When and How to Use It

Caching helps when you repeatedly compute the same result (often for read-heavy endpoints). It does not fix inefficient queries by itself; first remove N+1 and add indexes, then cache stable results.

Per-View Caching

Per-view caching stores the entire response for a period of time. It’s useful for endpoints that are identical for many users (or vary only by a small set of parameters).

from django.views.decorators.cache import cache_page

@cache_page(60)  # seconds
def product_catalog(request):
    ...

Be careful with personalization and permissions. If the response depends on the logged-in user, you must vary the cache key appropriately (or avoid per-view caching).

Low-Level Caching (Cache API)

Low-level caching lets you cache specific expensive computations, such as a list of IDs, a computed summary, or a rendered fragment.

from django.core.cache import cache

CACHE_KEY = "top_customers:v1"

def get_top_customers():
    data = cache.get(CACHE_KEY)
    if data is not None:
        return data

    data = list(
        Customer.objects
        .filter(active=True)
        .order_by("-lifetime_value")
        .values_list("id", "email")[:100]
    )
    cache.set(CACHE_KEY, data, timeout=300)
    return data

Practical tips:

Version your keys (e.g., :v1) so you can invalidate by changing the version.
Cache data that is expensive to compute and relatively stable.
Keep cached payloads small; cache IDs or summaries rather than huge objects.

Pagination: When to Consider It (and What It Fixes)

Pagination reduces response size and database work by limiting rows returned. Consider it when:

List endpoints can return hundreds/thousands of rows.
Clients only need the first page most of the time.
Sorting large datasets is slow.

Typical ORM pattern:

PAGE_SIZE = 50
page = int(request.GET.get("page", 1))
offset = (page - 1) * PAGE_SIZE

qs = Order.objects.order_by("-created_at")
orders = qs[offset:offset + PAGE_SIZE]

Note: offset pagination can become slower for very large offsets. If you hit that problem, consider keyset pagination (e.g., “created_at < last_seen”) for better scalability.

A Repeatable Workflow to Improve Performance (and Keep It Improved)

1) Establish a Baseline

Pick one slow endpoint or page.
Measure: total response time, number of SQL queries, and total SQL time.
Record the baseline in a short note (or in an issue tracker): “Before: 120 queries, 450ms SQL time”.

2) Identify the Biggest Cost First

If query count is high: look for N+1 and duplicates.
If query count is low but slow: inspect the slowest query and check indexes and query plans.
If payload is huge: add pagination and reduce selected columns.

3) Apply Targeted Fixes

N+1: add select_related/prefetch_related.
Slow filters/sorts: add indexes; rewrite filters to be index-friendly.
Repeated work: add low-level caching for expensive computations.

4) Re-measure and Compare

Re-run the same request with the same data volume.
Compare query count and timing to the baseline.
If improvement is unclear, inspect SQL again and verify the ORM change actually affected the query.

5) Add Guardrails Against Regressions

Add assertNumQueries tests for critical endpoints.
Keep a small “performance checklist” in code review: “Any loops over querysets with related field access? Any new filters without indexes?”
Use development query inspection tools regularly when changing list endpoints or serializers.

Quick Reference Table: Common Problems and Fixes

Symptom	Likely Cause	Primary Fix
Dozens/hundreds of similar queries	N+1 on relations	`select_related` (FK/O2O), `prefetch_related` (M2M/reverse FK)
Single query is very slow	Missing index, expensive sort/filter	Add index, check `explain()`, rewrite query
Large response and slow rendering	Too many rows/columns	Pagination, `values_list`/`values`, limit fields
Same expensive computation repeated	No caching	Per-view or low-level caching with sensible keys/timeouts
Performance improves then regresses later	No measurement/guardrails	Baseline metrics + query-count tests + review checklist

Now answer the exercise about the content:

You have a list of objects and need to display a field from a single-valued related model (ForeignKey/OneToOne) for each item. What is the best ORM approach to avoid N+1 queries?

You are right! Congratulations, now go to the next page

You missed! Try again.

N+1 often happens when iterating a queryset and accessing a related field. For single-valued relations (ForeignKey/OneToOne), select_related() uses a JOIN to load related rows in the same query, reducing extra per-object queries.