SLA Expectations for MSP Services
This article is for educational purposes only and does not constitute professional compliance advice or legal counsel. Your specific situation may vary, and you should evaluate any service provider relationship based on your organization's unique requirements.
You're looking at an MSP's availability guarantee—99.9% uptime, they say—and you're trying to figure out what that actually means for your business. Does 99.9% mean you get four minutes of downtime per month or something else? What about response times? What about when they meet the uptime goal technically but your users are still experiencing problems? Service level agreements are supposed to define what an MSP commits to, but most SLAs are written in ways that protect the vendor while leaving you with questions about what you're actually getting.
Understanding realistic SLA targets, how they're measured, and what happens when they're missed is essential because SLAs form the basis of your entire service relationship. A good SLA is specific, realistic, and enforceable. A bad SLA creates false confidence because it sounds good but doesn't actually protect you.
Uptime Percentages and What They Actually Mean
MSPs love to quote uptime percentages, and the numbers sound impressive until you understand what they mean operationally. "99.9% uptime" sounds like near-perfect service. In reality, it means 43 minutes of acceptable downtime per month. "99.99% uptime" means 4 minutes per month. "99.95%" means about 22 minutes per month. These numbers compound across multiple systems, and most organizations don't realize they're buying into downtime budgets they don't actually need.
The first thing to understand is that not every system needs high availability. Your email system is business-critical, and email downtime is catastrophic. Your development environment is important but non-critical, and developers can be offline for a few hours without causing major business impact. Your internal wiki or knowledge base is nice to have, and downtime is an inconvenience but not a business threat. Different systems need different uptime targets.
For most business-critical systems, 99.5% uptime is reasonable and achievable. That's about 3.5 hours of acceptable downtime per month. For important systems that aren't critical, 99% uptime is adequate—that's about 7 hours per month. For everything else, 95% uptime is fine, and some systems don't need an uptime target at all because they're rarely used or have external redundancy.
The problem is that 99.99% uptime is genuinely expensive to guarantee because it requires redundancy, failover systems, and continuous monitoring. If you're buying 99.99% uptime when you actually need 99%, you're paying premium pricing for protection you don't need. Be realistic about what your business actually requires instead of buying the highest percentage the vendor offers.
How Uptime Is Actually Measured
This is where SLAs get tricky. Uptime measurement sounds straightforward but it's full of opportunities for ambiguity. Does the MSP measure uptime on individual systems or aggregate uptime? If you have three email servers and one goes down, is that a total outage or a partial outage? Does measurement include scheduled maintenance, or is scheduled maintenance excluded from the SLA?
Ask the MSP specifically: how do you measure uptime? Do you measure each system individually or aggregate? If you have redundancy, do I get credit for partial failures? Most reasonable MSPs will exclude scheduled maintenance from SLA calculations, which is fair—you know it's coming. But confirm this. If scheduled maintenance is counting against your uptime SLA, that's unreasonable.
Ask about third-party outages. If your internet provider has an outage and your systems are down, does that count against the MSP's SLA or is it excluded? Most reasonable MSPs exclude outages outside their control. But some try to include everything, which means you're essentially guaranteeing uptime on things the MSP doesn't control. That's unacceptable.
Ask what happens if you have redundancy. If the MSP provides a primary system and you have your own backup system, is downtime measured on the primary system only or on your actual service availability? You want measurement to reflect your actual experience, not just whether the MSP's specific system is up.
Response Time and Resolution Time Matter Differently
Response time and resolution time are often confused, and that confusion creates frustration. Response time is how fast the MSP acknowledges your problem and starts working on it. Resolution time is how fast they actually fix it. You need both metrics in your SLA, and you need to understand they mean different things.
A reasonable response time for critical issues is 15-30 minutes, 24/7. This means if your systems are down at 2 AM, someone is getting paged and responding within 30 minutes. For high-priority issues, 1-2 hours is reasonable. For medium priority, 4 hours is reasonable. For low priority, 8 business hours or next business day is fine.
Resolution time is where it gets tricky because it depends heavily on what's actually wrong. Some issues are fixed in minutes. Some take hours. Some require third-party vendor involvement and take days. Most reasonable MSPs differentiate resolution time by severity. Critical issues might commit to 4-hour resolution. High-priority might be 8-24 hours. Medium might be 24-48 hours. Low might be 3-5 business days.
But here's the reality: resolution time commitments are sometimes unrealistic, and good vendors know this. Some issues simply can't be resolved quickly. If your ERP system has a rare database corruption that requires vendor involvement, that might take days to resolve regardless of how fast the MSP responds. Ask the vendor: what types of issues can you realistically commit to resolving in 4 hours? What types take longer? If they claim everything can be resolved in 4 hours, they're not being realistic.
Support Hours and When Help Is Available
"24/7 support" sounds good but it means different things depending on the vendor. To some vendors, 24/7 support means a staffed security operations center with real analysts watching your systems around the clock and responding to issues immediately. To others, 24/7 support means someone will respond to your ticket within 24 hours. These are wildly different service levels.
Ask specifically: if something happens at 2 AM on a Saturday, who sees it first? How long before a human reviews it? Are you getting immediate escalation or are you waiting until Monday morning? If the MSP can't commit to incident response at night and on weekends, they can't honestly call it 24/7 support.
Some organizations don't actually need 24/7 support. If your critical systems are monitored and your MSP has an on-call rotation for genuine emergencies, that's often sufficient even if your named support hours are business hours. What matters is understanding what 24/7 actually means and whether it matches your actual needs. 24/7 support is expensive, and you should only pay for it if you genuinely need it.
Excluded Outages and What Doesn't Count
Every SLA has exclusions—situations where downtime doesn't count against the MSP's commitment. These are supposed to protect the MSP from liability for things outside their control. Common exclusions include third-party vendor outages, customer-caused issues, scheduled maintenance, and force majeure events like natural disasters.
These exclusions are reasonable, but read them carefully. "Third-party vendor outages" makes sense—if your cloud provider is down, that's not the MSP's fault. "Customer misconfiguration" makes sense—if you delete a critical database, that's not the MSP's fault. "Scheduled maintenance" makes sense if announced in advance. "Force majeure" makes sense.
But some MSPs try to exclude things that should be covered. If the exclusion says "operator error" and that includes human mistakes by the MSP's staff, that's too broad. If the exclusion says "any issue caused by third-party software" and that includes software the MSP selected and configured, that's overreach. Read the exclusions and understand what's not covered.
Ask the MSP: what's included and what's not covered by the SLA? If you see an exclusion that concerns you, push back. Ask whether the MSP can accept responsibility for broader categories. A good vendor will be willing to negotiate reasonable exclusions.
SLA Metrics Beyond Uptime
Uptime is the most common SLA metric, but it's not the only one that matters. Response time and resolution time should be in your SLA with specific targets by severity. Some vendors also commit to performance baselines—your systems will respond to requests within a certain timeframe, not just be "up." This matters because a system can technically be up but running so slowly it's unusable.
Ask about other metrics: are you committing to any performance baselines? What about ticket resolution rate—what percentage of tickets should be resolved without escalation? What about customer satisfaction—do you measure it? These secondary metrics don't replace uptime but they provide additional accountability.
Consequences for Missing SLAs
An SLA without enforcement is a promise the vendor can break without penalty. The enforcement mechanism is what makes an SLA meaningful. Good MSPs offer service credits when they miss SLAs. These should be automatic—the credit applies to your next bill without you requesting it. If you have to claim credits, most organizations won't bother because the administrative hassle exceeds the credit value.
A reasonable credit structure is 5% of your monthly bill for one missed SLA, 10% for two missed, and 15% for three or more per month. Some vendors cap total credits at 25% even if they miss constantly—that's reasonable as long as 25% represents real financial impact.
Ask the MSP: what happens if you miss an SLA? Is there a credit? Is it automatic? What's the credit amount? If the vendor can't answer these questions clearly, the SLA isn't enforced and doesn't mean much.
Matching SLAs to Your Actual Business
The best SLAs are the ones that match your actual needs, not the highest numbers the vendor offers. If your business can tolerate email being down for 2 hours per month, don't buy 99.99% uptime. If your development environment doesn't need high availability, don't pay for it.
Think through each system: how much downtime can you actually tolerate? If your payment processing system is down for 1 hour, what's the business impact? If your internal collaboration tool is down for 4 hours, what's the impact? Answer those questions and let the answers drive your SLA choices.
A vendor that recommends realistic SLAs based on your business is thinking about value. A vendor that recommends the highest tier for everything is selling, not advising. When an MSP says every system is critical and needs 99.99% uptime, push back and ask them to justify each recommendation.
SLA Gaps and What's Not Covered
Even with a comprehensive SLA, gaps exist. The SLA might guarantee uptime while performance degrades. The SLA might guarantee response time while resolution time extends. The SLA might cover infrastructure but not application support.
Ask: beyond the uptime guarantee, what else do you commit to? What happens if you meet uptime but I'm experiencing performance problems? What happens if response time is fast but resolution time is slow? The more comprehensive the SLA, the better protected you are.
Some contracts include SLA reviews—periodic conversations about whether the SLA metrics remain appropriate as your environment changes. These reviews are valuable because they force both sides to discuss whether the agreement still makes sense.
Calculating and Verifying SLA Metrics
Ask the MSP: how do you calculate and report on SLA metrics? Can I see real-time dashboards showing uptime for each system? Will I get monthly reports with SLA performance? Transparency is a sign of confidence. A vendor that resists transparency or limits reporting is concerning.
You should be able to independently verify that the MSP is meeting their commitments. Ask for access to monitoring data. Ask what tools they use to measure uptime. Ask whether you can audit their measurement methodology.
If the MSP refuses transparency, that's a warning sign. A good vendor will welcome your verification because they're confident they're meeting commitments.
The Closing Framework
A good SLA is specific—it names severity levels and attaches measurable targets to each one. It's realistic—vendors can actually commit to meeting it. It's enforceable—there are clear consequences for missing it. It's comprehensive—it covers what matters to you, not just uptime. And it's transparent—you can verify it's being met.
When you're evaluating MSPs, don't get seduced by impressive uptime percentages. Instead, ask: can you explain what these percentages actually mean for my specific systems? Can you help me define realistic SLAs for my environment? Can you guarantee you'll meet them and show me how you measure? The vendor that engages seriously with these questions is the one that's thinking about your success, not just their revenue.
Fully Compliance provides educational content about IT compliance and cybersecurity. This article reflects general guidance about SLA expectations for MSP services. Individual service agreements vary — evaluate any SLA based on your organization's specific business requirements and risk tolerance.