All courses > Technology and Programming > Cloud Computing and Web Servers ::

Azure Fundamentals for Web Hosting: Cost-aware design and ongoing cost control on Azure

Capítulo 12

Estimated reading time: 10 minutes

Why cost-aware design matters for web hosting

On Azure, most web-hosting spend is driven by a few recurring meters: how much compute you reserve or consume, how you scale, how much data you store, how much data leaves Azure (egress), how long you retain logs, and whether you pay for resources that sit idle. Cost-aware design means making these drivers explicit in your architecture and putting guardrails in place so monthly spend stays predictable—even when traffic changes.

This chapter focuses on practical cost control for small-to-medium sites across three common hosting models: App Service, Container Apps, and Virtual Machines. The goal is not to minimize cost at all times, but to align spend with business value and avoid surprises.

Cost driver 1: Compute sizing (the biggest lever)

How hosting choice changes compute billing

App Service: You pay for the App Service Plan instances (reserved capacity) while they are running, regardless of whether your app is idle. Multiple apps can share the same plan, which can reduce cost if you consolidate.
Container Apps: You typically pay based on vCPU/memory consumed by active replicas (and sometimes a baseline depending on configuration). It can scale down aggressively, which can be cost-effective for bursty workloads.
Virtual Machines: You pay for the VM size while it is allocated (running), plus OS disk and any attached storage. You manage the web stack and patching, but you also control the exact size and can stop/deallocate when not needed.

Right-sizing: a practical approach

Right-sizing is choosing the smallest compute that meets performance and reliability needs with headroom. For small-to-medium sites, over-provisioning is common because initial sizing is guessed and never revisited.

Start with a target: define acceptable response time and CPU/memory thresholds (for example, keep average CPU under 60% and memory under 75% during peak).
Measure real usage: observe CPU, memory, and request rate during typical and peak periods.
Resize in small steps: move one tier/size at a time, validate, then repeat.

Step-by-step: right-size an App Service Plan

Open the App Service Plan in the Azure portal.
Review recent CPU and memory usage (and request rate) for the plan.
If CPU and memory are consistently low (for example, CPU < 20% and memory < 40% at peak), consider moving down one SKU or reducing instance count.
Change the pricing tier (Scale up) or instance count (Scale out) and monitor for at least one business cycle (often 1–2 weeks).
Repeat until you reach a stable, cost-efficient baseline.

Step-by-step: right-size a VM

Check VM CPU and memory utilization over at least 7 days (include peak traffic windows).
If CPU is low and memory is not pressured, resize to a smaller VM size.
Validate application performance and disk I/O after resizing.
Consider using smaller OS disks if appropriate, and avoid premium disks unless you need the IOPS/latency.

Cost driver 2: Scaling settings (and runaway scale)

Scaling is essential for reliability, but it can also create cost spikes if limits are not set. The most common cost incident for web workloads is an autoscale rule that keeps adding instances due to a misconfiguration, a traffic spike, or a bot attack.

Guardrails for predictable scaling

Set maximum instance/replica limits: always define a hard cap that matches your budget tolerance.
Use scale-out cooldowns: avoid rapid oscillation that increases cost without improving user experience.
Choose sensible metrics: CPU alone can be misleading; combine with request rate, queue length, or response time where possible.
Plan for “bad traffic”: rate limiting/WAF is a security topic, but it also protects your budget by preventing scale driven by abusive traffic.

Practical example: autoscale limits

For a small-to-medium site, you might set a baseline of 1–2 instances and a maximum of 4 instances. This provides headroom while keeping the worst-case monthly compute spend bounded.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Example guardrail mindset (not exact portal steps): Baseline: 1 instance Minimum: 1 instance Maximum: 4 instances Scale out when: CPU > 70% for 10 minutes Scale in when: CPU < 40% for 20 minutes Cooldown: 10 minutes

Cost driver 3: Storage (app data, disks, and backups)

Storage costs are usually steady and predictable, but they can grow silently. The key is to understand what you are storing (and why), and to apply lifecycle rules.

Where storage costs show up

VM disks: OS disk and data disks; premium tiers cost more. Over-sized disks increase cost even if mostly empty.
App content and artifacts: build artifacts, container images, and uploaded media can accumulate.
Backups/snapshots: frequent backups retained for long periods can become a major line item.

Techniques to control storage spend

Right-size disks: choose the smallest disk tier that meets performance needs.
Lifecycle management: automatically move older blobs to cooler tiers or delete after a retention period (especially for logs and exports stored in Storage).
Backup retention policy: keep daily backups for a short window and weekly/monthly for longer, based on recovery requirements.

Cost driver 4: Bandwidth and egress (data leaving Azure)

Inbound data is typically free, but outbound data (egress) is often billed. For web hosting, egress cost can become significant when serving large assets (images, video, downloads) or when traffic grows.

Common egress multipliers

Large static assets served directly from the app/VM.
Frequent downloads (PDFs, installers, media files).
APIs returning large payloads.
Cross-region traffic (for example, app in one region calling a database in another).

Cost-control techniques for bandwidth

Cache aggressively: set appropriate cache headers for static assets to reduce repeated downloads.
Compress responses: enable gzip/brotli where applicable.
Offload static content: serve static assets from a storage-backed static endpoint or CDN to reduce compute load and often lower effective egress cost.
Keep services co-located: place dependent services in the same region to avoid cross-region data transfer charges.

Cost driver 5: Logging retention and telemetry volume

Logging is essential for operations, but it is easy to overspend by collecting too much data or retaining it too long. The two main levers are ingestion volume (how much you send) and retention (how long you keep it).

Practical logging cost controls

Set retention intentionally: keep high-detail logs for a short window (for example, 7–30 days), and keep only aggregated metrics longer.
Filter noisy logs: reduce verbose application logs in production unless actively troubleshooting.
Sample traces: for distributed tracing, sampling can cut ingestion dramatically while preserving diagnostic value.
Separate environments: non-production environments often generate lots of logs; apply stricter retention and sampling there.

Cost driver 6: Idle resources (paying for “nothing”)

Idle resources are the easiest savings. Common examples: non-production environments running 24/7, old test VMs left allocated, unused public IPs, orphaned disks, and container images that are never deployed.

Scheduling shutdown for non-production (where applicable)

For environments that do not need to run outside business hours, scheduling can reduce compute cost substantially.

Virtual Machines: stopping and deallocating a VM typically stops compute charges (storage still applies). This is often the biggest non-prod saving.
App Service: you can reduce instance count or move to a cheaper tier for non-prod; stopping the app may not eliminate plan cost if the plan remains running for other apps.
Container Apps: configure scaling to allow scale-to-zero for non-prod where appropriate, and ensure min replicas is set to 0 if you want true idle savings.

Step-by-step: implement an “idle resource” cleanup routine

Create a naming/tagging convention (for example: env=dev/test/prod, owner, expiresOn).
Weekly: review resources with no recent activity (VMs with low CPU, unused disks, old container images).
Remove or downsize anything without a clear owner or purpose.
For non-prod: apply schedules (VM deallocate, scale-to-zero, or reduce tiers) and document exceptions.

Choosing appropriate tiers without overpaying

Tier selection is a balance: lower tiers reduce cost but may limit features (scaling options, networking integrations, performance). A cost-aware approach is to choose the lowest tier that meets requirements, then add features only when there is a clear need.

Practical tiering guidelines for small-to-medium sites

Production baseline: choose a tier that supports your reliability needs (for example, at least two instances for redundancy where required) and has enough headroom for peak traffic.
Non-production: use smaller tiers and fewer instances; prioritize cost over performance.
Consolidate where safe: multiple small apps can share an App Service Plan to improve utilization, but avoid mixing workloads with very different scaling patterns if it causes over-scaling.

Ongoing cost control: budgets, alerts, and accountability

Design choices help, but predictable spending requires ongoing controls. The most effective pattern is: define a budget, alert early, and assign ownership.

Step-by-step: set a budget and alerts

Decide the scope: subscription, resource group, or a set of resources (for example, production web hosting).
Set a monthly budget amount based on expected baseline plus a buffer for scaling.
Create alert thresholds (for example, 50%, 80%, 100%).
Route alerts to the right people (email/action group) and define what happens at each threshold (investigate, scale limits, disable non-essential environments).

Cost allocation hygiene

Tag consistently: environment, application, owner, cost center.
Review regularly: a short monthly review of top cost contributors prevents slow cost creep.
Track unit economics: for example, cost per 1,000 requests or cost per active user; this helps detect inefficiency even when total spend is stable.

Wrap-up exercise: estimate and compare monthly costs (App Service vs Container Apps vs VM)

Scenario

You are hosting the same small-to-medium site with these characteristics:

Average traffic: 20 requests/second during business hours, 5 requests/second off-hours
Peak traffic: 80 requests/second for 2 hours/day
Compute need at peak: about 2 vCPU and 4–6 GB RAM total
Static assets: 200 GB/month outbound
Logs: 2 GB/day ingested, retain 30 days in production; non-prod retain 7 days
Non-production environment exists and is only needed 10 hours/day on weekdays

Part A: Build a cost worksheet (fill in with your region’s pricing)

Create a simple table with these rows and columns. Use the Azure Pricing Calculator (or your organization’s price sheet) to fill in numbers for your chosen region.

Columns: App Service | Container Apps | Virtual Machine Rows: Compute baseline (monthly) Compute scale-out (monthly worst-case) Storage (disks / app data / backups) Bandwidth egress (200 GB/month) Logging ingestion (2 GB/day) Logging retention (30 days) Non-production compute (with schedule) Total estimated monthly cost

Part B: Make assumptions for each hosting option

App Service: choose a plan tier and instance count that covers baseline; set autoscale max to cover peak; remember you pay for plan instances continuously.
Container Apps: choose vCPU/memory per replica; set min replicas (consider 0 for non-prod); set max replicas to cap cost; estimate active time at peak vs baseline.
VM: choose a VM size that covers baseline; decide whether you need multiple VMs for redundancy; include OS disk and any data disks; consider reserved capacity only if you are confident in long-term steady usage.

Part C: Identify optimization actions (at least 6)

Based on your totals, list concrete actions to reduce cost while keeping the site reliable. Include at least one action per cost driver:

Compute sizing: downsize one tier/size after measuring peak utilization; consolidate apps onto a shared plan if safe.
Scaling settings: set max instances/replicas; increase cooldown; adjust thresholds to avoid over-scaling.
Storage: reduce disk tier/size; apply lifecycle rules to old artifacts; tighten backup retention.
Bandwidth/egress: enable caching/compression; offload static assets; reduce payload sizes.
Logging retention: reduce retention for verbose logs; sample traces; filter noisy categories.
Idle resources: schedule non-prod shutdown/deallocation; remove orphaned disks and unused resources.

Part D: Choose a “predictable spend” configuration

For each hosting model, write a final configuration that prioritizes predictable monthly spend:

Define a baseline capacity (instances/replicas/VM size).
Define a hard maximum scale limit.
Define log retention and sampling settings.
Define a non-prod schedule (or scale-to-zero) and confirm what costs remain (for example, storage).
Define a monthly budget and alert thresholds.

Now answer the exercise about the content:

Which design choice best keeps monthly Azure web hosting spend predictable when traffic spikes or a misconfiguration causes unexpected scaling?

You are right! Congratulations, now go to the next page

You missed! Try again.

Runaway autoscale is a common cost incident. Setting a maximum scale cap, adding cooldowns, and choosing sensible metrics helps prevent uncontrolled instance growth and keeps worst-case spend bounded.

100%

Azure Fundamentals for Web Hosting: From App Service to Virtual Machines

New course

12 pages