Negotiation Basics for Buyers: Service Levels and SLAs for Ongoing Performance

Capítulo 11

Estimated reading time: 10 minutes

+ Exercise

What service levels and SLAs do (and don’t) do

When you buy an ongoing service (or a product that needs ongoing support), the commercial deal is only half the value. The other half is performance over time: how fast issues are handled, how reliably the service runs, and how predictable the supplier is when something goes wrong. Service levels translate those expectations into measurable commitments. An SLA (Service Level Agreement) is the document section (often an exhibit) that defines those commitments, how they’re measured, and what happens if they’re missed.

Service levels are not “nice-to-have promises.” They are operational requirements expressed as metrics with clear measurement rules. A strong SLA protects both sides: buyers get predictable outcomes; suppliers get unambiguous targets and a fair process for proving performance.

Typical service-level categories buyers should define

  • Response times: how quickly the supplier acknowledges and begins working on an issue.
  • Technical support availability: hours of coverage (e.g., 24/7, business hours), channels (phone, portal), language, and time zones.
  • Onsite service: dispatch time, arrival window, geographic coverage, and prerequisites (site access, safety induction).
  • Spare parts: availability, stocking locations, delivery times, and substitution rules.
  • Uptime/availability targets: percentage availability, maintenance windows, and what counts as downtime.
  • Escalation: named roles, escalation triggers, and time-to-engage at each level.

Make service levels measurable: the “metric anatomy” checklist

Before negotiating numbers, structure each service level so it can be measured and enforced. Use this checklist for every metric:

  • Metric name (e.g., “P1 Incident Response Time”).
  • Definition: what event starts the clock and what stops it.
  • Scope: which products/sites/users are covered.
  • Target: the required performance level (e.g., “15 minutes”).
  • Measurement window: monthly/quarterly; business hours vs 24/7.
  • Data source: ticketing system, monitoring tool, dispatch logs.
  • Exclusions: planned maintenance, buyer-caused delays, force majeure.
  • Remedy: service credits, re-performance, root-cause action plan.

If any element is missing, you risk an SLA that sounds good but can’t be proven or enforced.

Practical service-level definitions buyers can adapt

1) Response time (acknowledgement and engagement)

What to define: severity levels, start/stop events, and whether response means “acknowledged” or “actively working.”

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

SeverityExample impactResponse targetRestore/Workaround target
P1 CriticalService down / safety risk / production halted15 minutes (24/7)4 hours
P2 HighMajor degradation, no workaround1 hour (24/7)12 hours
P3 MediumDegraded, workaround exists4 business hours5 business days
P4 LowHow-to / minor issue1 business dayNext planned release / 20 business days

Negotiation tip: Ask the supplier to propose severity definitions first, then tighten them to match your operational reality. Many disputes come from misclassified incidents.

2) Technical support availability

What to define: coverage hours, holidays, channels, and staffing commitments.

  • Coverage: “24/7/365 for P1–P2; business hours for P3–P4.”
  • Channels: “Phone and portal for P1; portal for P2–P4.”
  • Time zones: specify the reference time zone and whether coverage is local per site.
  • Language: “English and Spanish support for P1–P2.”

Pitfall to avoid: “24/7 support” that is actually an answering service. Require that after-hours calls reach a qualified engineer within the response target.

3) Onsite service (dispatch and arrival)

What to define: dispatch time vs arrival time, and what conditions must be met for the clock to start.

  • Dispatch target: “Technician dispatched within 2 hours of P1 confirmation.”
  • Arrival target: “Onsite within 8 hours for sites within 100 km; within 24 hours otherwise.”
  • Prerequisites: “Buyer provides site access, safety escort, and remote diagnostics approval within 30 minutes.”

Negotiation tip: If the supplier pushes back on arrival times, trade for a remote workaround target plus a firm onsite commitment for unresolved cases.

4) Spare parts (availability and logistics)

What to define: which parts are “critical,” stocking model, and delivery lead times.

  • Critical spares list: attach as an exhibit with part numbers and revision compatibility.
  • Stocking: “Supplier maintains minimum stock of X at regional depot; replenished within Y days.”
  • Delivery: “Critical spares delivered within 12 hours for P1; within 2 business days for P2.”
  • Substitutions: allowed only with written approval and no loss of warranty/support.

Pitfall to avoid: “Parts available” without specifying where they are stocked and the shipping method. Availability without logistics is not availability.

5) Uptime / availability targets

What to define: the formula, measurement period, and what counts as downtime.

Example definition:

Availability (%) = (Total Minutes in Period - Downtime Minutes) / Total Minutes in Period * 100
  • Period: monthly.
  • Downtime: unplanned outage impacting production use; excludes scheduled maintenance within agreed windows.
  • Maintenance window: “Sundays 02:00–06:00 local time, max 2 per month, 7 days’ notice.”
  • Dependencies: clarify whether third-party network/power is excluded and how shared responsibility is handled.

Negotiation tip: If the supplier insists on excluding third-party dependencies, require a commitment to provide diagnostics and vendor coordination within defined timeframes.

6) Escalation (time-to-engage leadership)

What to define: triggers and named roles (or role titles) with response times.

Escalation levelTriggerSupplier roleTime to engage
Level 1P1 openedOn-call engineer15 minutes
Level 2P1 not restored within 2 hoursSupport manager30 minutes
Level 3P1 not restored within 4 hours or repeat incidentOperations director1 hour
Level 4Chronic breach / safety eventExecutive sponsor1 business day

Pitfall to avoid: escalation paths that exist on paper but don’t require actual engagement times or decision authority.

Step-by-step: drafting an SLA that works in real operations

Step 1: Define scope (what is covered)

Write scope in operational terms, not marketing terms. Include:

  • Services covered: support, maintenance, monitoring, onsite, parts logistics.
  • Assets/users/sites: list locations, environments, serial numbers, or tenant IDs.
  • Service hours: per severity level.
  • Interfaces: ticketing portal, phone numbers, monitoring endpoints.

Practical example: “This SLA applies to Model X compressors at Sites A, B, and C, including remote monitoring, incident support, and onsite repair. It excludes operator training and consumables unless listed in Exhibit D.”

Step 2: Choose metrics (what you will measure)

Pick a small set of metrics that reflect outcomes you care about and that the supplier can influence. Common choices:

  • Incident response time (by severity)
  • Time to restore service (by severity)
  • Onsite arrival time (by geography)
  • First-time fix rate (for field service)
  • Parts delivery time (critical vs non-critical)
  • Availability/uptime
  • Customer satisfaction (optional, but define method carefully)

Pitfall to avoid: too many metrics. Over-instrumentation creates reporting noise and weakens enforcement because everything becomes negotiable.

Step 3: Define measurement method (how the clock runs)

This is where most SLAs fail. Specify:

  • Start event: ticket creation time, monitoring alert time, or buyer call time.
  • Stop event: service restored, workaround delivered, or buyer confirmation.
  • Business hours: define calendar, holidays, and time zone.
  • Pauses: when the clock pauses (e.g., waiting for buyer access) and how pauses are logged.
  • Tool of record: “Supplier ticketing system is system of record; buyer has read access.”

Baseline requirement: include a “first 30–60 days baseline” clause if you don’t have historical data, but do not allow it to become an indefinite pilot with no enforceable targets.

Step 4: Set reporting cadence (how you will see performance)

Reporting should be frequent enough to catch issues early, and structured enough to be actionable.

  • Operational dashboard: weekly (P1/P2 incidents, open backlog, parts delays).
  • SLA scorecard: monthly (metric attainment, credits, trends).
  • Problem management review: monthly/quarterly (repeat incidents, root causes, preventive actions).

Practical requirement: “Supplier provides a monthly SLA report within 5 business days of month-end, including raw ticket extracts and downtime logs.”

Step 5: Define remedies and service credits (what happens if targets are missed)

Remedies should be proportional, predictable, and designed to drive correction—not to create constant conflict. Options include:

  • Service credits: a percentage of monthly fees tied to specific breaches.
  • Re-performance: additional support hours, expedited onsite visits, or replacement units.
  • Corrective action plan: mandatory RCA (root cause analysis) and prevention plan for repeated breaches.
  • Step-in rights: buyer can use third parties for urgent fixes at supplier cost (carefully drafted).

Example credit table (adapt to your context):

MetricThresholdCreditCap
Monthly availability< 99.5%5% of monthly service feeTotal credits capped at 20% per month
Monthly availability< 99.0%10% of monthly service fee
P1 response time> target in 2+ cases/month3% of monthly service fee

Pitfall to avoid: credits that are so small they don’t matter, or so large they make the supplier defensive and unwilling to agree. Also avoid credits as the only remedy for chronic failure—pair them with corrective actions and escalation.

Step 6: Write exclusions and dependencies (what is not counted)

Exclusions are necessary, but they must be specific and not swallow the SLA.

  • Planned maintenance: only within defined windows and notice periods.
  • Force majeure: define and require mitigation steps.
  • Buyer-caused delays: must be documented (e.g., access not provided within X minutes).
  • Third-party systems: specify shared responsibility and coordination obligations.

Practical safeguard: “Exclusions apply only to the extent they directly cause the breach, and supplier must provide evidence in the monthly report.”

Common SLA pitfalls and how buyers avoid them

Pitfall: Unmeasurable promises

Problem language: “Best efforts,” “as soon as possible,” “industry standard,” “high availability.”

Fix: Replace with quantified targets and definitions. If the supplier resists, ask: “What number do you manage to internally?” Then negotiate around that operational reality.

Pitfall: Missing baselines and unclear starting points

Problem: You can’t prove a breach because the clock start is ambiguous (e.g., email timestamp vs ticket timestamp).

Fix: Define the system of record and require buyer visibility. If email is allowed, require conversion to a ticket within a defined time (e.g., 10 minutes) and use the earliest timestamp.

Pitfall: Metrics that can be gamed

Examples: closing tickets without resolution, reclassifying severity to meet targets, counting “response” as an auto-reply.

Fix: Add anti-gaming rules:

  • Ticket closure requires documented resolution and buyer confirmation for P1/P2 (or auto-confirm after X hours with no objection).
  • Severity changes require reason codes and are audited in the monthly report.
  • Response requires human engagement by qualified staff for P1/P2.

Pitfall: No linkage between SLA and operations

Problem: SLA exists, but there’s no process to review it, so issues accumulate until renewal.

Fix: Build a performance review framework (below) with regular cadence, owners, and action tracking.

Pitfall: One-size-fits-all uptime targets

Problem: A single uptime number ignores critical periods (e.g., peak production hours) or maintenance realities.

Fix: Use tiered commitments (e.g., higher availability during critical windows) or define separate metrics for “core hours availability” vs “overall availability.”

Supplier performance review framework: collaborative, accountable, repeatable

1) Establish governance roles

  • Operational owners: day-to-day ticket and service delivery leads (both sides).
  • Service manager: owns SLA reporting, trend analysis, and improvement plan.
  • Executive sponsor: removes roadblocks and approves investments/escalations.

2) Use a three-layer meeting cadence

  • Weekly operations check-in (30–45 min): open P1/P2 incidents, aging tickets, parts delays, upcoming maintenance.
  • Monthly SLA scorecard (60 min): metric attainment, credits (if any), disputes on measurement, corrective actions with due dates.
  • Quarterly business review (90–120 min): trend themes, root causes, capacity/coverage changes, roadmap alignment, risk register updates.

3) Standardize the scorecard

Keep it consistent month to month so trends are visible. A practical scorecard includes:

  • Headline metrics: availability, P1/P2 response, restore times, onsite arrival, parts delivery.
  • Volume context: ticket counts by severity, top categories, repeat incidents.
  • Breaches: what happened, impact, whether exclusion claimed (with evidence).
  • Corrective actions: owner, due date, status, verification method.
  • Improvement backlog: preventive maintenance, monitoring enhancements, training, spares optimization.

4) Apply a “no-surprises” escalation rule

To keep collaboration strong, agree that potential SLA misses are flagged early:

  • Supplier notifies buyer when a P1 is at risk of missing restore target (e.g., at 50% of target time elapsed).
  • Buyer commits to fast decisions on access, approvals, and workaround acceptance within defined times.

5) Tie accountability to learning, not blame

When breaches occur, require an RCA that focuses on prevention:

  • Timeline: what happened and when.
  • Root cause: technical and process contributors.
  • Containment: immediate fix and customer communication.
  • Corrective/preventive actions: specific changes, owners, dates.
  • Effectiveness check: how you’ll confirm the fix worked (e.g., no recurrence for 60 days).

This framework keeps the relationship constructive while ensuring the SLA remains a living tool for performance, not a document that only appears during disputes.

Now answer the exercise about the content:

Which approach best makes a service-level commitment enforceable in an SLA?

You are right! Congratulations, now go to the next page

You missed! Try again.

Service levels should be operational requirements expressed as measurable metrics. Including definition, scope, targets, measurement rules, data sources, exclusions, and remedies ensures performance can be proven and enforced.

Next chapter

Negotiation Basics for Buyers: Closing, Documenting, and Maintaining the Deal

Arrow Right Icon
Free Ebook cover Negotiation Basics for Buyers: Getting Better Deals Without Burning Bridges
92%

Negotiation Basics for Buyers: Getting Better Deals Without Burning Bridges

New course

12 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.