Why cost-aware design matters for web hosting
On Azure, most web-hosting spend is driven by a few recurring meters: how much compute you reserve or consume, how you scale, how much data you store, how much data leaves Azure (egress), how long you retain logs, and whether you pay for resources that sit idle. Cost-aware design means making these drivers explicit in your architecture and putting guardrails in place so monthly spend stays predictable—even when traffic changes.
This chapter focuses on practical cost control for small-to-medium sites across three common hosting models: App Service, Container Apps, and Virtual Machines. The goal is not to minimize cost at all times, but to align spend with business value and avoid surprises.
Cost driver 1: Compute sizing (the biggest lever)
How hosting choice changes compute billing
- App Service: You pay for the App Service Plan instances (reserved capacity) while they are running, regardless of whether your app is idle. Multiple apps can share the same plan, which can reduce cost if you consolidate.
- Container Apps: You typically pay based on vCPU/memory consumed by active replicas (and sometimes a baseline depending on configuration). It can scale down aggressively, which can be cost-effective for bursty workloads.
- Virtual Machines: You pay for the VM size while it is allocated (running), plus OS disk and any attached storage. You manage the web stack and patching, but you also control the exact size and can stop/deallocate when not needed.
Right-sizing: a practical approach
Right-sizing is choosing the smallest compute that meets performance and reliability needs with headroom. For small-to-medium sites, over-provisioning is common because initial sizing is guessed and never revisited.
- Start with a target: define acceptable response time and CPU/memory thresholds (for example, keep average CPU under 60% and memory under 75% during peak).
- Measure real usage: observe CPU, memory, and request rate during typical and peak periods.
- Resize in small steps: move one tier/size at a time, validate, then repeat.
Step-by-step: right-size an App Service Plan
- Open the App Service Plan in the Azure portal.
- Review recent CPU and memory usage (and request rate) for the plan.
- If CPU and memory are consistently low (for example, CPU < 20% and memory < 40% at peak), consider moving down one SKU or reducing instance count.
- Change the pricing tier (Scale up) or instance count (Scale out) and monitor for at least one business cycle (often 1–2 weeks).
- Repeat until you reach a stable, cost-efficient baseline.
Step-by-step: right-size a VM
- Check VM CPU and memory utilization over at least 7 days (include peak traffic windows).
- If CPU is low and memory is not pressured, resize to a smaller VM size.
- Validate application performance and disk I/O after resizing.
- Consider using smaller OS disks if appropriate, and avoid premium disks unless you need the IOPS/latency.
Cost driver 2: Scaling settings (and runaway scale)
Scaling is essential for reliability, but it can also create cost spikes if limits are not set. The most common cost incident for web workloads is an autoscale rule that keeps adding instances due to a misconfiguration, a traffic spike, or a bot attack.
Guardrails for predictable scaling
- Set maximum instance/replica limits: always define a hard cap that matches your budget tolerance.
- Use scale-out cooldowns: avoid rapid oscillation that increases cost without improving user experience.
- Choose sensible metrics: CPU alone can be misleading; combine with request rate, queue length, or response time where possible.
- Plan for “bad traffic”: rate limiting/WAF is a security topic, but it also protects your budget by preventing scale driven by abusive traffic.
Practical example: autoscale limits
For a small-to-medium site, you might set a baseline of 1–2 instances and a maximum of 4 instances. This provides headroom while keeping the worst-case monthly compute spend bounded.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Example guardrail mindset (not exact portal steps): Baseline: 1 instance Minimum: 1 instance Maximum: 4 instances Scale out when: CPU > 70% for 10 minutes Scale in when: CPU < 40% for 20 minutes Cooldown: 10 minutesCost driver 3: Storage (app data, disks, and backups)
Storage costs are usually steady and predictable, but they can grow silently. The key is to understand what you are storing (and why), and to apply lifecycle rules.
Where storage costs show up
- VM disks: OS disk and data disks; premium tiers cost more. Over-sized disks increase cost even if mostly empty.
- App content and artifacts: build artifacts, container images, and uploaded media can accumulate.
- Backups/snapshots: frequent backups retained for long periods can become a major line item.
Techniques to control storage spend
- Right-size disks: choose the smallest disk tier that meets performance needs.
- Lifecycle management: automatically move older blobs to cooler tiers or delete after a retention period (especially for logs and exports stored in Storage).
- Backup retention policy: keep daily backups for a short window and weekly/monthly for longer, based on recovery requirements.
Cost driver 4: Bandwidth and egress (data leaving Azure)
Inbound data is typically free, but outbound data (egress) is often billed. For web hosting, egress cost can become significant when serving large assets (images, video, downloads) or when traffic grows.
Common egress multipliers
- Large static assets served directly from the app/VM.
- Frequent downloads (PDFs, installers, media files).
- APIs returning large payloads.
- Cross-region traffic (for example, app in one region calling a database in another).
Cost-control techniques for bandwidth
- Cache aggressively: set appropriate cache headers for static assets to reduce repeated downloads.
- Compress responses: enable gzip/brotli where applicable.
- Offload static content: serve static assets from a storage-backed static endpoint or CDN to reduce compute load and often lower effective egress cost.
- Keep services co-located: place dependent services in the same region to avoid cross-region data transfer charges.
Cost driver 5: Logging retention and telemetry volume
Logging is essential for operations, but it is easy to overspend by collecting too much data or retaining it too long. The two main levers are ingestion volume (how much you send) and retention (how long you keep it).
Practical logging cost controls
- Set retention intentionally: keep high-detail logs for a short window (for example, 7–30 days), and keep only aggregated metrics longer.
- Filter noisy logs: reduce verbose application logs in production unless actively troubleshooting.
- Sample traces: for distributed tracing, sampling can cut ingestion dramatically while preserving diagnostic value.
- Separate environments: non-production environments often generate lots of logs; apply stricter retention and sampling there.
Cost driver 6: Idle resources (paying for “nothing”)
Idle resources are the easiest savings. Common examples: non-production environments running 24/7, old test VMs left allocated, unused public IPs, orphaned disks, and container images that are never deployed.
Scheduling shutdown for non-production (where applicable)
For environments that do not need to run outside business hours, scheduling can reduce compute cost substantially.
- Virtual Machines: stopping and deallocating a VM typically stops compute charges (storage still applies). This is often the biggest non-prod saving.
- App Service: you can reduce instance count or move to a cheaper tier for non-prod; stopping the app may not eliminate plan cost if the plan remains running for other apps.
- Container Apps: configure scaling to allow scale-to-zero for non-prod where appropriate, and ensure min replicas is set to 0 if you want true idle savings.
Step-by-step: implement an “idle resource” cleanup routine
- Create a naming/tagging convention (for example:
env=dev/test/prod,owner,expiresOn). - Weekly: review resources with no recent activity (VMs with low CPU, unused disks, old container images).
- Remove or downsize anything without a clear owner or purpose.
- For non-prod: apply schedules (VM deallocate, scale-to-zero, or reduce tiers) and document exceptions.
Choosing appropriate tiers without overpaying
Tier selection is a balance: lower tiers reduce cost but may limit features (scaling options, networking integrations, performance). A cost-aware approach is to choose the lowest tier that meets requirements, then add features only when there is a clear need.
Practical tiering guidelines for small-to-medium sites
- Production baseline: choose a tier that supports your reliability needs (for example, at least two instances for redundancy where required) and has enough headroom for peak traffic.
- Non-production: use smaller tiers and fewer instances; prioritize cost over performance.
- Consolidate where safe: multiple small apps can share an App Service Plan to improve utilization, but avoid mixing workloads with very different scaling patterns if it causes over-scaling.
Ongoing cost control: budgets, alerts, and accountability
Design choices help, but predictable spending requires ongoing controls. The most effective pattern is: define a budget, alert early, and assign ownership.
Step-by-step: set a budget and alerts
- Decide the scope: subscription, resource group, or a set of resources (for example, production web hosting).
- Set a monthly budget amount based on expected baseline plus a buffer for scaling.
- Create alert thresholds (for example, 50%, 80%, 100%).
- Route alerts to the right people (email/action group) and define what happens at each threshold (investigate, scale limits, disable non-essential environments).
Cost allocation hygiene
- Tag consistently: environment, application, owner, cost center.
- Review regularly: a short monthly review of top cost contributors prevents slow cost creep.
- Track unit economics: for example, cost per 1,000 requests or cost per active user; this helps detect inefficiency even when total spend is stable.
Wrap-up exercise: estimate and compare monthly costs (App Service vs Container Apps vs VM)
Scenario
You are hosting the same small-to-medium site with these characteristics:
- Average traffic: 20 requests/second during business hours, 5 requests/second off-hours
- Peak traffic: 80 requests/second for 2 hours/day
- Compute need at peak: about 2 vCPU and 4–6 GB RAM total
- Static assets: 200 GB/month outbound
- Logs: 2 GB/day ingested, retain 30 days in production; non-prod retain 7 days
- Non-production environment exists and is only needed 10 hours/day on weekdays
Part A: Build a cost worksheet (fill in with your region’s pricing)
Create a simple table with these rows and columns. Use the Azure Pricing Calculator (or your organization’s price sheet) to fill in numbers for your chosen region.
Columns: App Service | Container Apps | Virtual Machine Rows: Compute baseline (monthly) Compute scale-out (monthly worst-case) Storage (disks / app data / backups) Bandwidth egress (200 GB/month) Logging ingestion (2 GB/day) Logging retention (30 days) Non-production compute (with schedule) Total estimated monthly costPart B: Make assumptions for each hosting option
- App Service: choose a plan tier and instance count that covers baseline; set autoscale max to cover peak; remember you pay for plan instances continuously.
- Container Apps: choose vCPU/memory per replica; set min replicas (consider 0 for non-prod); set max replicas to cap cost; estimate active time at peak vs baseline.
- VM: choose a VM size that covers baseline; decide whether you need multiple VMs for redundancy; include OS disk and any data disks; consider reserved capacity only if you are confident in long-term steady usage.
Part C: Identify optimization actions (at least 6)
Based on your totals, list concrete actions to reduce cost while keeping the site reliable. Include at least one action per cost driver:
- Compute sizing: downsize one tier/size after measuring peak utilization; consolidate apps onto a shared plan if safe.
- Scaling settings: set max instances/replicas; increase cooldown; adjust thresholds to avoid over-scaling.
- Storage: reduce disk tier/size; apply lifecycle rules to old artifacts; tighten backup retention.
- Bandwidth/egress: enable caching/compression; offload static assets; reduce payload sizes.
- Logging retention: reduce retention for verbose logs; sample traces; filter noisy categories.
- Idle resources: schedule non-prod shutdown/deallocation; remove orphaned disks and unused resources.
Part D: Choose a “predictable spend” configuration
For each hosting model, write a final configuration that prioritizes predictable monthly spend:
- Define a baseline capacity (instances/replicas/VM size).
- Define a hard maximum scale limit.
- Define log retention and sampling settings.
- Define a non-prod schedule (or scale-to-zero) and confirm what costs remain (for example, storage).
- Define a monthly budget and alert thresholds.