MSP Escalation Procedures

This article is educational content for understanding escalation procedures in MSP relationships. It is not legal guidance, not an SLA template, and not a substitute for clearly documented escalation procedures in your contract.


Your organization reported an issue to the MSP three days ago. A first-level technician has been working on it without making progress, and the problem is still affecting your business. When you called to ask what's happening, you found out that the technician hasn't escalated to a senior person because there's no clear escalation procedure—he doesn't know when he's supposed to escalate or to whom. Meanwhile, your MSP expected the issue to be resolved by now.

This is a common failure point in MSP relationships: when problems aren't resolved quickly, it's often because escalation procedures aren't clear. A first-level technician working alone on a hard problem doesn't know when to ask for help. You don't know whether your escalation request will be honored or ignored. The MSP doesn't know whether a user complaint means they need to drop everything or whether it's one of many low-priority requests. Clear escalation procedures solve this by defining when escalation should happen, who it escalates to, what authority each level has, and how long each level has to work on a problem before escalating further.

Defining Severity and Business Impact

Escalation procedures start with severity definitions, but severity is often misunderstood. The technical severity of a problem (how complex it is to fix) is different from business severity (how much it affects your operations). A complex technical issue affecting one user might be low priority. A simple issue affecting 200 users might be critical. Good escalation procedures define severity based on business impact, not technical complexity.

A standard framework distinguishes between critical (systems are down or severely degraded, many users can't work, immediate business impact), high (significant systems or functions are impaired, meaningful number of users affected, business impact is substantial), medium (some systems or functions work but with significant limitations, smaller number of users affected, business impact is noticeable but manageable), and low (systems work with minor limitations, very limited user impact, business can operate normally with workarounds).

The critical insight here is that severity is business-specific. For a financial services firm, a customer-facing trading system being down is critical immediately. For a law firm, an internal research tool being down is high or medium depending on how many lawyers depend on it and whether workarounds exist. The MSP should help you define what severity levels mean in your specific business context so everyone shares the same understanding.

First-Level Support Responsibilities

Most MSP support structures have different levels. First-level support (sometimes called tier one or frontline support) is usually the first person to engage with a problem. They might handle it themselves if it's routine, or they might gather information and escalate if needed. First-level support is not expected to resolve every problem—they're expected to resolve routine problems and recognize when escalation is needed.

First-level support responsibilities typically include taking the initial ticket or call from the user, asking diagnostic questions to understand the problem, attempting basic troubleshooting (restarts, checking connectivity, verifying configuration), documenting what they find and what they've tried, and escalating to the next level when appropriate.

The key is that first-level support should know when to escalate. They shouldn't waste hours troubleshooting a complex issue when senior people could resolve it faster. They also shouldn't escalate too quickly—escalating everything means the next level gets overwhelmed and nothing gets resolved fast. Clear criteria help. If basic troubleshooting steps are being tried and the issue isn't resolved within a specified time (maybe 30 minutes to one hour), it's time to escalate. If the problem is affecting multiple users, it's probably time to escalate. If the problem is outside normal operating procedures or basic troubleshooting, it's time to escalate. First-level support should know these rules.

Escalation Criteria and Thresholds

To prevent both unnecessary escalation and problems that linger unresolved, escalation criteria should be specific. These might include time-based criteria (if not resolved within one hour, escalate), complexity-based criteria (if the issue is outside normal troubleshooting, escalate), impact-based criteria (if severity is high or critical, escalate immediately), or pattern-based criteria (if the same issue is recurring, escalate to identify root cause).

Time-based escalation is important for preventing problems from sitting stalled. If first-level support is working on an issue but making no progress, at what point should they escalate to the next level? Thirty minutes for critical issues? Two hours for high-priority issues? These thresholds should be clear and agreed to in advance. Complexity-based escalation recognizes that first-level support has limits. If the issue is a corrupted database, a failed hard drive, or requires root-level access to a production system, escalate immediately. First-level support should recognize the limits of what they can troubleshoot.

Impact-based escalation recognizes that high-impact issues need senior attention immediately. If the issue is critical (systems down, business significantly affected), it shouldn't wait for first-level support to work through basic troubleshooting. Escalate immediately to the senior engineer who can make decisions quickly. Pattern-based escalation addresses recurring issues. If you're experiencing the same problem repeatedly, that's a signal that first-level troubleshooting of the symptom isn't the solution. Someone needs to investigate the root cause and fix it. This requires escalating to someone with authority to make infrastructure changes.

The Escalation Path and Decision Authority

Once an issue is escalated, where does it go? The escalation path should define who takes responsibility at each level and what authority they have to make decisions. A typical escalation path includes first-level support (entry point), senior technical specialist (can troubleshoot complex issues, make configuration changes, authorize short-term workarounds), engineering or infrastructure team (can design solutions, make architectural changes, plan improvements), and management (can make business decisions, prioritize work, allocate resources).

Each level should understand their authority. A senior technician can probably authorize a server restart or apply a configuration change. They probably can't authorize replacing expensive hardware or committing the MSP to a 24-hour on-site presence. MSP management can make business-level decisions like prioritizing your issue over other clients' issues or allocating additional resources. You should also understand your authority to make decisions. If the senior technician says we need to replace this hard drive and it will take two hours, do you have authority to approve that, or do you need to check with someone? Clear authority prevents delays where decisions get stuck waiting for approval from the wrong person.

Time Commitments at Each Escalation Level

Escalation procedures should include time commitments for how long an issue should spend at each level before escalating further. This prevents problems from sitting at one level indefinitely. For example, first-level support should make progress within 30 minutes or escalate, a senior technician should make meaningful progress within two hours or escalate to engineering, and engineering should identify a solution within one business day or escalate to management for business decision.

These time commitments need to be realistic. Don't set one-hour resolution commitment for complex issues—you'll be disappointed every time. But do set time commitments for escalation to ensure problems don't get stuck. The time commitments should also be severity-based. Critical issues should escalate faster than medium issues. A critical issue at first-level support might escalate within 30 minutes. A medium issue might spend two hours at first-level support before escalating.

Emergency Procedures and Out-of-Hours Support

Escalation doesn't stop at business hours. What happens if a critical issue occurs at 11 PM on a Sunday? Your escalation procedures should define how to reach someone with authority to make decisions immediately.

This typically includes an emergency contact number that goes directly to an on-call person, not a voicemail system. The on-call person has authority to make immediate decisions about emergency response—whether to restart systems, restore from backup, or call in additional staff. They should be briefed on your critical systems and understand what decisions they can make without checking with management. Out-of-hours escalation should also define response time expectations. How quickly will someone respond to an emergency call? What's your expectation for how quickly the problem will be addressed? The MSP should be clear about their capabilities during off-hours. Some MSPs have 24/7 staff. Others have on-call staff who can be available but might take 15 minutes to contact. Others only support non-emergency issues off-hours.

Customer Escalation and Complaint Handling

Escalation isn't just about technical problems—it's also about complaints and relationship issues. What if you're unhappy with how an issue is being handled, or you think the MSP is doing something wrong? You need a way to escalate your concern without going through the normal technical support chain.

This typically includes a customer escalation path: raise the concern with your account manager or primary contact at the MSP; if not resolved, escalate to the account manager's manager or the MSP's customer success team; if still not resolved, escalate to MSP leadership. At each level, your concern should be heard and addressed seriously.

Customer escalations are different from technical escalations. A customer escalation might be we're not happy with the response time to issues, or the MSP is making changes without asking us, or we don't feel like the MSP understands our business. These are valid concerns that need to be addressed through relationship management, not technical troubleshooting. Your MSP should have a process for customer escalations and should track them to ensure they're resolved. If customer escalations are frequent, that's a signal that something about the relationship isn't working and needs to be fixed.

Preventing Escalation Through Prevention

The best escalations are the ones that don't happen. Problems prevented are better than problems escalated. An MSP that focuses on prevention—doing proactive monitoring, catching issues before they become critical, maintaining systems so failures are rare—will have fewer escalations.

Prevention includes proactive monitoring that detects problems before users notice, regular maintenance and updates that prevent common failures, good documentation so troubleshooting is faster, testing of backups and disaster recovery so restoration works when needed, capacity planning that prevents systems from running out of resources, and root-cause analysis of issues to prevent recurrence.

An MSP that's focused on preventing issues will propose infrastructure improvements, flag systems that are approaching end of life, recommend updates before they become critical, and analyze patterns in your issues to identify systemic problems. An MSP that's just responding to escalations is reactive and less valuable to your organization. The better MSP relationship is one where you rarely need to escalate because the MSP catches problems before they escalate.

Testing Escalation Procedures

Clear escalation procedures are only useful if they work when you need them. The best practice is to test them periodically so you know that the procedures work and the people involved know what to do. A simulated escalation scenario—reporting a test issue and watching how it gets handled—can reveal gaps in procedures or confusion about who's responsible.

Testing should be scheduled and low-risk. You don't want to discover during a real emergency that nobody knows who the on-call person is or that the number doesn't work. Regular testing prevents that surprise and helps ensure that when you need to escalate for real, the procedures work smoothly.

Closing Reflection

Escalation procedures exist so that problems get resolved appropriately at the right level. When they work well, critical issues get immediate senior attention, routine issues are handled quickly by first-level support, and problems that linger get escalated so they can be resolved. When they don't work, problems get stuck, you don't know who to contact, and frustration builds. Clear procedures, tested regularly, ensure that when something goes wrong, the response is appropriate and timely. The MSP relationship becomes much more effective when both parties understand how escalation works and trust that it will happen when needed.


Fully Compliance provides educational content about IT compliance and cybersecurity. This article reflects general guidance about MSP escalation procedures. Individual MSP relationships vary—evaluate any escalation approach based on your organization's specific needs and risk profile.