Resilience Patterns for Robust Systems: Practical Java Examples with Resilience4j
Implement Circuit Breakers, Rate Limiters, Bulkheads, Retries, and Fallbacks with Real-World Examples
Introduction
We’ve all seen outages that cost companies more than just money. A flaky third-party API. A loop of retries taking down the whole system. A dashboard full of red metrics and no one answering PagerDuty. Building features is hard enough; building something that doesn’t fall apart under pressure is a whole different game.
That’s where resilience patterns come in. They’re small, focused ideas that address specific failure modes. On their own, they help. When combined, they keep your system standing.
Note: All examples here use Java and Resilience4j for demonstration purposes, a popular fault tolerance library. The patterns apply to any language or toolset. Swap in your own stack.
What Makes a System Resilient
A resilient system doesn't collapse when things fail. If one part crashes, everything else keeps going. Slowdowns won't block core features like checkout or login. Sudden traffic spikes won’t overwhelm it. And once issues are resolved, the system recovers automatically—without anyone hovering over it.
You also need good visibility through logs, metrics, or tracing. Otherwise, you're just guessing what's happening.
You want a system that handles problems without drama.
Patterns That Help You Stay Up
Let’s walk through a few battle-tested patterns. Each one tackles a different kind of failure—timeouts, overloads, broken APIs, bad luck.
I’ll show how they work, when to use them, and how they fit together without turning your system into a Rube Goldberg machine.
Circuit Breaker
Say a dependency starts failing, an external API slows to a crawl or times out. You don’t want your system stuck waiting or retrying endlessly.
A Circuit Breaker monitors a service. If failures (like timeouts or 5xx errors) exceed a threshold, it trips and blocks further calls for a bit. Later, it sends test requests to check if the service is back. If they succeed, the circuit closes; if not, it stays open.
Example
This trips if 50% of calls fail, pauses for 30 seconds, then tests recovery:
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50)
.waitDurationInOpenState(Duration.ofSeconds(30))
.slidingWindowSize(10)
.build();
CircuitBreakerRegistry registry = CircuitBreakerRegistry.of(config);
CircuitBreaker circuitBreaker = registry.circuitBreaker("externalApi");
Supplier<String> decoratedCall = CircuitBreaker.decorateSupplier(
circuitBreaker,
() -> externalService.call()
);
try {
String result = decoratedCall.get();
} catch (CallNotPermittedException e) {
result = "default response";
}When to Use
Flaky third-party services (e.g., shipping, SMS)
Non-critical dependencies
Rate Limiter
Traffic can spike out of nowhere, think flash sales or bots. A Rate Limiter controls how many requests your service accepts, preventing overload before it starts.
Example
This allows 10 requests per second, rejecting extras after a 500ms wait:
RateLimiterConfig config = RateLimiterConfig.custom()
.limitForPeriod(10)
.limitRefreshPeriod(Duration.ofSeconds(1))
.timeoutDuration(Duration.ofMillis(500))
.build();
RateLimiterRegistry registry = RateLimiterRegistry.of(config);
RateLimiter rateLimiter = registry.rateLimiter("myService");
Supplier<String> decorated = RateLimiter.decorateSupplier(
rateLimiter,
() -> myService.call()
);
try {
String result = decorated.get();
} catch (RequestNotPermitted e) {
result = "rate limited response";
}
When to Use
Public endpoints like login or search
To enforce usage quotas
Bulkhead (Concurrency Limiter)
While a rate limiter controls how many requests per second enter the system, a bulkhead controls how many can run at once. This is crucial when long-running or slow operations risk exhausting threads, CPU, or memory.
Example
This allows 5 concurrent requests, keeps 2 threads ready, and queues 10 extras:
ThreadPoolBulkheadConfig config = ThreadPoolBulkheadConfig.custom()
.maxThreadPoolSize(5)
.coreThreadPoolSize(2)
.queueCapacity(10)
.build();
ThreadPoolBulkheadRegistry registry = ThreadPoolBulkheadRegistry.of(config);
ThreadPoolBulkhead bulkhead = registry.bulkhead("inventoryService");
Supplier<CompletableFuture<String>> decorated = ThreadPoolBulkhead.decorateSupplier(
bulkhead,
() -> CompletableFuture.supplyAsync(() -> inventoryService.check())
);
decorated.get().thenAccept(result -> {
}).exceptionally(ex -> {
return null;
});When to Use
Slow or resource-intensive services (e.g., inventory checks)
Retry Pattern
Retries give temporary glitches another chance without flooding the service. They're helpful for quick network issues or short-lived problems.
Example
Retry up to three times with 500ms delays, only for network errors:
RetryConfig config = RetryConfig.custom()
.maxAttempts(3)
.waitDuration(Duration.ofMillis(500))
.retryExceptions(IOException.class, TimeoutException.class)
.build();
RetryRegistry registry = RetryRegistry.of(config);
Retry retry = registry.retry("inventoryRetry");
Supplier<String> decorated = Retry.decorateSupplier(
retry,
() -> inventoryService.call()
);
try {
String result = decorated.get();
} catch (Exception e) {
result = "retry fallback";
}Fallbacks
Fallbacks are your safety net, ensuring your app stays responsive when dependencies fail. They provide alternative responses, like cached data, error messages, or queued actions, so users aren’t left hanging.
Example
Return cached data when CircuitBreaker trips.
Supplier<String> decorated = circuitBreaker.decorateSupplier(() -> externalService.call(), () -> "cached data");When to Use
Always pair fallbacks with circuit breakers, rate limiters, retries.
Applying Patterns Without Losing Revenue
Patterns like rate limiters and bulkheads sound risky, who wants to drop paying customers? But big players use smart tweaks to protect revenue while staying resilient.
Rate Limiter (Soft limits)
Don’t just reject requests when you hit the cap. Instead:
Allow short bursts above your limit (e.g. 10 req/s sustained, but 30 req/s burst).
Prioritize traffic based on value, logged-in users or high-value carts bypass limits.
Push excess traffic to a short waiting room or queue ("One moment..."), holding users for just 1-2 seconds rather than rejecting them outright.
Bulkhead (Isolate Paths)
Don’t just put a concurrency limit across everything. Instead:
Separate critical tasks (checkout, payment) into larger pools sized for peak traffic.
Isolate slower or optional calls (recommendations, loyalty points, shipping) in smaller pools.
Provide fast fallbacks if optional pools saturate, ensuring the main payment path stays fast and reliable.
Combining Patterns Without Overdoing It
Don’t use patterns just to use them. Combine them thoughtfully:
Slow external API? Use circuit breaker + fallback.
Public endpoint under high traffic? Rate limiter + bulkhead.
Occasional errors? Retry + fallback.
Pick what's necessary, and keep your system clean and maintainable.
Practical Scenarios
Checkout Flow in E-commerce
A customer clicks "Buy." Your system calls inventory, payment, and shipping. Any could fail or lag.
Rate limiter: Prevent checkout flooding.
Bulkhead: Stop slow inventory checks from blocking payments.
Circuit breaker + retry: Protect shipping API, fallback to "shipping pending"
Retry + queue: Payment attempts don’t get lost.
Search Feature in a Web App
Searches can overload your database or backend easily:
Rate limiter: Manage request bursts.
Bulkhead: Prevent slow searches from impacting other features.
Retry + circuit breaker: Handle occasional database glitches gracefully.
Fallback: Show cached results if all else fails.
Conclusion
Building resilience isn't about using every available tool—it's about choosing wisely. Start simple, observe your failures, and adjust. Your system will handle pressure gracefully, without constant firefighting.
Future posts might dive deeper into advanced techniques like backpressure and load shedding—tools you’ll reach for as you scale.




