Designing failure semantics across ingress, Java services, clients, and serverless
The most dangerous thing about Kubernetes probes is not that teams misunderstand their definitions. It is that teams forget probes are actions, not just signals. A failed readiness probe changes routing. A failed liveness probe kills a process. A failed startup probe can stall a rollout. That means probes do not merely detect failure; they shape it.
Kubernetes is explicit here: startup gates the other probes, readiness controls whether traffic is sent to a pod, and liveness exists for restart-worthy states such as deadlock. It also warns that incorrectly implemented liveness probes can cause cascading failures.
That is the lens required for a modern Kubernetes architecture: Client → NGINX Ingress → API service → backend service → managed SQL + event bus + internal services + LLM APIs
The real design question is not “what is liveness vs. readiness?” It is: When each dependency fails, what behavior do we want from the platform, the service, and the client?
What the Public Incidents Actually Teach
Theory is great, but post-mortems are where we learn how systems actually die. Two major public outages serve as masterclasses in why tools designed to protect systems can weaponize a minor glitch into a catastrophic outage if they aren’t layered correctly.
1. GitHub’s August 2024 Outage: The Brittleness of Synthetic Health Checks In August 2024, GitHub suffered a massive outage affecting core read services. The root cause was not a database crash. Instead, a configuration change inadvertently broke the synthetic health-check pings used by their database routing layer (the proxies managing the MySQL replicas). Because the proxies couldn’t get a successful ping, they marked perfectly capable database nodes as DOWN and stopped sending them traffic.
The Lesson: This highlights the profound danger of giving automation the power to sever traffic based on a synthetic check. The database was likely still capable of serving read queries, but the automation panicked. While a DB proxy must use health checks, this incident proves why we must be incredibly conservative with Application-Layer Probes. If you configure your K8s Readiness probe to fail based on a synthetic check to a downstream dependency, you are begging for a self-inflicted outage.
2. Slack’s February 2022 Outage: The Illusion of “Jitter” Slack experienced a major outage triggered by a sudden increase in database latency. Slack’s engineers had followed textbook best practices: their clients used Exponential Backoff with Jitter. However, because the degradation lasted for several minutes, millions of clients eventually exhausted their timers and began retrying simultaneously. Even with jitter, the sheer mathematical volume created a Thundering Herd that pinned the database at 100% CPU.
The Lesson: Jittered retries are a micro-level solution to a macro-level problem. Jitter prevents 1,000 clients from retrying at the exact same millisecond, but it does nothing to reduce the total volume of load. To survive this, you need Retry Budgets and Layer 1 Load Shedding.
Probes Define Failure Semantics
My practical rule is simple:
- startupProbe defines boot semantics.
- readinessProbe defines traffic semantics.
- livenessProbe defines restart semantics.
A startup probe should answer only one question: Has the process finished bootstrapping? Kubernetes explicitly says startup probes suppress readiness and liveness until startup succeeds, which is the cleanest way to protect slow starters (like heavy Spring Boot apps) without turning liveness into a blunt delay timer.
A readiness probe should answer: Should this instance receive one more request right now? It should reflect local resource exhaustion, queue growth, or shutdown/drain state.
A liveness probe should answer: Is the process locally broken in a way that a restart can fix? Spring Boot’s guidance is blunt and correct: the liveness probe should not depend on external systems, because if an external system fails, Kubernetes might restart all application instances and create cascading failures.
Not All Dependencies Fail the Same Way
Classify dependencies by the thing they run out of.
1. Managed SQL and the Autoscaling Death Spiral
Databases rarely fail by going hard-down. They fail by becoming slow enough to exhaust pools, threads, and transaction budgets.
Here lies one of the most dangerous traps in Kubernetes: If you configure your K8s Readiness probe to fail when the HikariCP connection pool is exhausted, you will trigger an Autoscaling Death Spiral.
- Pod A’s DB pool fills up.
- Readiness fails, and Kubernetes removes Pod A from routing.
- Load instantly shifts to Pods B and C. Their pools fill up, and they are removed.
- The Horizontal Pod Autoscaler (HPA) panics and scales up 10 new pods.
- The 10 new pods attempt to open hundreds of new JDBC connections to a database that is already dying. The DB crashes completely.
The SRE Fix: The “Zero-DB” Probe Rule. There is absolutely no valid scenario where an application-layer Kubernetes Readiness or Liveness probe should care about the database connection pool. Probes are strictly for the compute layer (Is Tomcat accepting HTTP connections?).
When the pool fills up, the pod must stay “Ready” in Kubernetes, but the Circuit Breaker must instantly reject new database-bound requests, returning a fast 503 Service Unavailable. The pod absorbs the incoming traffic and cheaply sheds the load without adding pressure to the DB. The HPA sees a healthy pod doing its job, so it doesn’t spin up new pods into a bottleneck.
(Note: Your architecture must also guarantee that: [HPA Max Replicas] × [HikariCP Max Pool Size] ≤ [Database Max Connections])
2. Event Buses (Flow-Control Dependencies)
The danger with Pub/Sub systems is redelivery pressure and consumer saturation.
- Do not fail liveness because the broker is having a bad day.
- Cap in-flight messages and bytes.
- If the worker is saturated, stop admitting more work locally rather than pretending a pod restart is a cure.
3. LLM APIs (Quota Dependencies)
LLM APIs behave like shared quota systems. Putting this in a readiness probe is a mistake. You need a local rate limiter, a local concurrency cap, a hard timeout, and a fallback path when the model is quota-limited.
The Three Lines of Defense
Layer 1: What Ingress Should Do Ingress is your first line of defense. The best use of ingress is to enforce short request budgets and coarse traffic discipline: short timeouts, very limited retries (only for idempotent operations), and edge rate limiting to protect the platform during spikes. It should not become your main retry engine.
Layer 2: What Java Services Should Do For a Java stack, the most practical baseline is Spring Boot Actuator, Micrometer, and Resilience4j. Resilience4j matches the real classes of failure. Note that if you rely on Spring annotations, the default aspect order is Retry ( CircuitBreaker ( RateLimiter ( TimeLimiter ( Bulkhead ( Function ) ) ) ) ). For critical paths, functional decorators are often clearer than annotation stacks because the order becomes explicit.
Layer 3: What Clients Should Do (The Commercial Reality) Most outages become real at the client. As Slack proved, clients need their own discipline. For reads, clients can retry with jittered exponential backoff, bounded by a strict retry budget. For writes, retries must be backed by idempotency keys, not hope.
This is where the commercial angle matters. “Graceful degradation” is not an infrastructure setting. Product and UX have to define it. If the recommendation engine fails, do we show generic content? Engineering cannot invent those business choices alone.
What Survives the Move to Serverless
The universal ideas are deadlines, backoff with jitter, idempotency, bulkheads, retry budgets, concurrency caps, flow control, pool budgets, and degraded modes. Those are distributed-systems ideas, not Kubernetes ideas.
AWS Lambda gives you different levers (no probes, but built-in retry behavior and concurrency limits). Lambda retries asynchronous failures by default, and AWS explicitly recommends reserved concurrency to prevent overwhelming downstream database connections. The mechanism changes, but the architectural problem does not.
Conclusion
The mature way to think about Kubernetes probes is not “three health checks.” It is “three platform actions.” Startup decides when the platform trusts a booting process. Readiness decides when the platform routes traffic. Liveness decides when the platform kills an instance.
Everything else-especially dependency health-belongs in the resilience layer. That is the layer that protects SQL from the Autoscaling Death Spiral, event consumers from redelivery floods, LLM integrations from quota spikes, and the customer from turning a partial failure into a full outage.