When Your SLA Depends on Everyone Else

Jim Leone

12/31/20252 min read

We still talk about five nines as if uptime exists in a vacuum.

In reality, modern services depend on carriers, clouds, APIs, SaaS platforms, security vendors, certificate authorities, and upstream providers that sit entirely outside our direct control, yet customers still expect a single, accountable outcome. When something breaks, the SLA doesn’t fail quietly. Trust does.

This disconnect has become one of the most persistent, and least openly discussed, challenges in modern IT and managed services.

The Legacy Promise of Five Nines...

The concept of “99.999% uptime” was born in a simpler era, one where infrastructure was owned, hosted, and operated by a single organization or a tightly controlled set of partners. Systems were vertically integrated. Dependencies were limited. Accountability was clear.

Five nines became shorthand for operational excellence.

Today, however, that promise is often made in isolation, while delivery happens in a deeply interconnected ecosystem. The language hasn’t evolved at the same pace as the architecture.

The Modern Dependency Web...

A single customer-facing service may rely on -->

  • One or more carriers

  • Multiple cloud regions

  • Third-party authentication providers

  • External APIs

  • DNS and certificate authorities

  • Security tooling and monitoring platforms

  • Vendor SOCs and support teams

Each of those dependencies has its own SLA, escalation model, exclusions, maintenance windows, and interpretation of “availability.” When one link falters, the impact cascades, often invisibly at first, until the customer feels it. And when they do, they don’t care which vendor technically caused it. They care that their service isn’t working.

When SLAs Collide...

Your SLA may promise rapid response or high availability, but it is often underpinned by vendor SLAs that -->

  • Allow longer response times

  • Exclude certain failure modes

  • Rely on best-effort support

  • Offer credits instead of remediation

  • Reset clocks based on acknowledgment, not resolution

On paper, everyone may be “within SLA.” Operationally, the customer may still be down. This isn’t negligence, it’s misalignment.

The KPI Illusion?

Most organizations track familiar metrics such as uptime percentage, MTTR, incident volume, and SLA compliance. These are useful, but incomplete. What often goes unmeasured are the factors that actually shape customer experience -->

  • Mean time to vendor acknowledgment

  • Dependency concentration risk

  • Third-party blast radius

  • Time-to-customer-communication

  • Speed of clarity, not just resolution

It’s entirely possible to have a dashboard full of green indicators while customers experience real disruption. When metrics look healthy but confidence erodes, the problem isn’t monitoring, it’s perspective.

Availability vs. Resilience

Availability focuses on not failing. Resilience focuses on recovering well when failure is inevitable. In a multi-vendor world, resilience matters more. Resilient organizations plan for imperfection. They assume dependencies will fail. They design for degradation, not just uptime. And they communicate early, even when answers aren’t complete. Customers don’t expect perfection, but they do expect honesty, competence, and leadership when things go wrong.

Redefining Accountability

Accountability doesn’t disappear just because dependencies exist, but it does need to be redefined. That means shifting from “Did we meet the SLA?” to “Did we own the outcome?” It means recognizing that service quality is no longer defined solely by infrastructure control, but by coordination, transparency, and response across organizational boundaries.

What I Believe Customers Should Be Asking For

Forward-thinking customers are starting to ask better questions, such as...

  • What third parties does this service depend on?

  • How are incidents communicated when the root cause is external?

  • What happens when vendors miss their own SLAs?

  • How is resilience tested, not just availability measured?

  • What commitments exist around communication, not just uptime?

I feel these questions signal maturity, not distrust.

A Newer, More Honest Model

The future of SLAs isn’t about abandoning standards, it’s about evolving them.

That evolution includes...

  • Dependency-aware SLAs

  • Communication commitments alongside uptime targets

  • Transparency over blame

  • Resilience metrics alongside availability metrics

  • Shared responsibility models that reflect reality

Five nines may still have a place, but only when paired with honesty about the ecosystem required to deliver them. Modern IT doesn’t fail because teams don’t care. It fails because complexity has outgrown the contracts designed to describe it. When your SLA depends on everyone else, success isn’t defined by perfection, it’s defined by how well you lead through uncertainty. That’s the difference between meeting a metric and earning trust.