Building Resilient APIs: Retries, Circuit Breakers, Rate Limiting, and the Things That Save You in Production

Building Resilient APIs hero image

Hi, my name is Amr Samir, and I am a Full-Stack Web Developer.

Maybe yes, maybe not, but I think every developer reaches a point where they stop thinking only about “how do I make this API work?” and start asking the more important question:

“What happens when this API fails?”

Because it will fail.

Maybe the database gets slow. Maybe a third-party service goes down. Maybe the network randomly decides to ruin your day. Maybe traffic spikes at the exact wrong time. This is normal in real systems, especially when you are building distributed applications, microservices, payment flows, dashboards, SaaS products, or anything that depends on more than one moving part.

In this blog, I want to talk about API resilience in a practical way: retries, circuit breakers, rate limiting, timeouts, testing, security, and incident response.

Not as fancy theory, but as tools that can keep your application alive when production starts acting like production.

Why API Resilience Matters

When everything is healthy, APIs feel simple.

The client sends a request. The backend receives it. The backend calls a database or another service. A response comes back. Everyone is happy.

But production is not always that clean.

A downstream service can become slow. A request can hang. A client can retry too many times. A queue can fill up. CPU can spike. Memory can climb. Then suddenly one small issue becomes a bigger outage.

That is the main reason resilience matters.

A resilient API is not an API that never fails. That does not exist. A resilient API is one that knows how to fail in a controlled way.

It should:

stop waiting forever,
retry carefully,
avoid retry storms,
stop calling unhealthy dependencies,
reject excess traffic early,
keep the rest of the system safe,
recover without needing a full restart or manual panic.

To me, resilience is really about building APIs for bad days, not only demo days.

The Common Failure Pattern

Most API incidents follow a very familiar pattern.

A service becomes slow. The caller does not have a good timeout. Requests start waiting. More requests come in. Some clients retry. Those retries create even more load. The slow service gets slower. Other services begin to suffer. Dashboards go red.

At that point, the original issue might be small, but the system’s reaction makes it much worse.

Resilient API request flow

This is why I care about defensive patterns like:

timeouts,
retries with backoff,
circuit breakers,
rate limiting,
fallbacks,
observability.

These are not “extra features.” They are production safety tools.

Start With Timeouts

Before even talking about retries, I want to say this clearly:

Every outbound call should have a timeout.

An API should not wait forever for another service.

This includes:

HTTP requests,
database queries,
Redis/cache calls,
message broker operations,
file storage calls,
third-party API calls.

Without a timeout, a request can hang until it consumes resources for too long. Multiply that by hundreds or thousands of requests, and your application can get stuck waiting on a dependency that may never respond.

A timeout gives your system a boundary. It says: “I will wait this long, and then I will stop.”

That sounds simple, but it is one of the most important resilience decisions you can make.

The Retry Pattern

A retry means trying the same request again after it fails.

This is useful because not every failure is permanent. Sometimes the network glitches. Sometimes a service is briefly overloaded. Sometimes a gateway returns a temporary error. In those cases, retrying can make the user experience smoother.

But retries are dangerous when they are done blindly.

If a service is already struggling, sending it more requests immediately can make things worse. That is how retry storms happen.

When Should We Retry?

Retries make sense for temporary errors like:

network timeouts,
connection resets,
503 Service Unavailable,
502 Bad Gateway,
504 Gateway Timeout,
sometimes 429 Too Many Requests, if the server tells you when to retry.

Retries usually do not make sense for:

400 Bad Request,
401 Unauthorized,
403 Forbidden,
validation errors,
business logic errors,
non-idempotent operations without protection.

This last point is important. If you retry an operation like payment, order creation, or money transfer without idempotency, you might accidentally perform it twice.

That is not resilience. That is a new bug.

Use Exponential Backoff and Jitter

The worst retry strategy is retrying immediately.

A better approach is exponential backoff. For example:

Attempt 1: wait 100ms
Attempt 2: wait 200ms
Attempt 3: wait 400ms
Attempt 4: wait 800ms

Then we add jitter, which means a small random delay.

Why? Because if every client retries at exactly the same time, you create another traffic spike. Jitter spreads retries out, which makes recovery smoother.

My personal rule is simple:

Retry slowly, retry only a few times, and never retry forever.

Retry Example in Node.js

Here is a simple example using axios-retry:

const axios = require('axios');
const axiosRetry = require('axios-retry');

axiosRetry(axios, {
  retries: 3,
  retryCondition: (error) => {
    return axiosRetry.isNetworkError(error) || axiosRetry.isRetryableError(error);
  },
  retryDelay: axiosRetry.exponentialDelay,
});

async function getData() {
  try {
    const response = await axios.get('https://api.example.com/data', {
      timeout: 2000,
    });

    return response.data;
  } catch (error) {
    console.error('Request failed after retries:', error.message);
    throw error;
  }
}

The important details here are:

we retry only retryable errors,
we limit retries to 3,
we use exponential delay,
we set a timeout,
we log the final failure clearly.

Retries should always have boundaries.

Retry Example in NestJS

In NestJS, we can use RxJS operators around HTTP calls.

import { Injectable } from '@nestjs/common';
import { HttpService } from '@nestjs/axios';
import { firstValueFrom, throwError } from 'rxjs';
import { catchError, retry } from 'rxjs/operators';

@Injectable()
export class ApiClientService {
  constructor(private readonly httpService: HttpService) {}

  async getData() {
    return firstValueFrom(
      this.httpService.get('https://api.example.com/data').pipe(
        retry({
          count: 3,
          delay: (_error, attempt) => 100 * Math.pow(2, attempt),
        }),
        catchError((error) => {
          console.error('Request failed after retries:', error.message);
          return throwError(() => error);
        }),
      ),
    );
  }
}

In real projects, I prefer keeping retry settings configurable. Hard-coded values are okay in examples, but in production you may need to tune them without touching every file.

Retry Example in Go

In Go, one practical option is go-retryablehttp.

package main

import (
	"log"
	"time"

	"github.com/hashicorp/go-retryablehttp"
)

func main() {
	client := retryablehttp.NewClient()

	client.RetryMax = 3
	client.RetryWaitMin = 100 * time.Millisecond
	client.RetryWaitMax = 1 * time.Second

	resp, err := client.Get("https://api.example.com/data")
	if err != nil {
		log.Fatalf("request failed after retries: %v", err)
	}
	defer resp.Body.Close()

	log.Println("request completed with status:", resp.Status)
}

For production, I would also use context cancellation, HTTP client timeouts, structured logs, and metrics around retry count.

Retry Trade-Offs

Retries improve reliability against short temporary failures, but they also increase load and latency.

So I try to keep this balance in mind:

retrying once or twice can help,
retrying too much can hurt,
retrying without a timeout is risky,
retrying non-idempotent operations can cause duplicate actions,
retrying during an outage can amplify the outage.

Retries are useful, but they are not magic.

They should work together with timeouts, circuit breakers, and rate limits.

The Circuit Breaker Pattern

A circuit breaker works like an electrical fuse.

If a downstream service keeps failing, the circuit breaker opens and stops sending traffic to it for a while. Instead of waiting for every request to timeout, the system fails fast.

This protects two things:

The caller, because it does not waste resources waiting.
The dependency, because it gets time to recover.

Circuit breaker state machine

A circuit breaker usually has three states.

1. Closed

This is the normal state.

Requests pass through. The breaker watches failures, latency, and errors.

If failures reach a certain threshold, the breaker opens.

2. Open

In the open state, the breaker stops calling the downstream service.

Requests fail immediately or return a fallback response.

This may sound bad, but it is better than letting every request hang for several seconds and overload your own application.

3. Half-Open

After a cooldown period, the breaker allows a small number of test requests.

If those requests succeed, the breaker closes again.

If they fail, the breaker opens again and waits longer.

This avoids sending full traffic to a service that has not fully recovered yet.

When Circuit Breakers Help

Circuit breakers are useful when your API depends on something that may become slow or unstable, such as:

payment providers,
user services,
search services,
inventory systems,
notification services,
third-party APIs,
databases,
internal microservices.

For example, imagine a checkout flow that calls a payment service.

If the payment service starts timing out, you do not want every checkout request to hang for ten seconds. You would rather fail fast, show a clear message, and avoid dragging the rest of the system down.

This is not perfect from a user experience point of view, but it is much better than a full outage.

Circuit Breaker Example in Node.js

Here is a basic example using opossum:

const CircuitBreaker = require('opossum');
const axios = require('axios');

async function fetchData() {
  const response = await axios.get('https://api.example.com/data', {
    timeout: 3000,
  });

  return response.data;
}

const breaker = new CircuitBreaker(fetchData, {
  timeout: 5000,
  errorThresholdPercentage: 50,
  resetTimeout: 10000,
});

breaker.fallback(() => {
  return {
    message: 'Service temporarily unavailable',
    degraded: true,
  };
});

async function getProtectedData() {
  try {
    return await breaker.fire();
  } catch (error) {
    console.error('Circuit breaker request failed:', error.message);
    throw error;
  }
}

I like fallback responses when they make sense. Sometimes cached data, partial data, or a simple “try again later” response is better than a slow failure.

Circuit Breaker Example in Go

Here is a simple example using sony/gobreaker with Gin:

package main

import (
	"errors"
	"net/http"
	"time"

	"github.com/gin-gonic/gin"
	"github.com/sony/gobreaker"
)

func main() {
	settings := gobreaker.Settings{
		Name:        "PaymentServiceBreaker",
		MaxRequests: 3,
		Interval:    60 * time.Second,
		Timeout:     10 * time.Second,
		ReadyToTrip: func(counts gobreaker.Counts) bool {
			return counts.ConsecutiveFailures >= 5
		},
	}

	breaker := gobreaker.NewCircuitBreaker(settings)

	r := gin.Default()

	r.GET("/order", func(c *gin.Context) {
		result, err := breaker.Execute(func() (interface{}, error) {
			client := &http.Client{
				Timeout: 2 * time.Second,
			}

			resp, err := client.Get("https://payment.example.com/pay")
			if err != nil {
				return nil, err
			}
			defer resp.Body.Close()

			if resp.StatusCode != http.StatusOK {
				return nil, errors.New("payment service returned non-200 status")
			}

			return "payment ok", nil
		})

		if err != nil {
			c.JSON(http.StatusServiceUnavailable, gin.H{
				"error": "Payment service temporarily unavailable",
			})
			return
		}

		c.JSON(http.StatusOK, gin.H{
			"result": result,
		})
	})

	r.Run(":8080")
}

The example is simple, but the concept is powerful: wrap risky calls and stop them from damaging the rest of the system.

Circuit Breaker Trade-Offs

Circuit breakers need careful tuning.

If the breaker opens too quickly, users may see unnecessary failures. If it opens too slowly, it will not protect the system in time.

You need to think about:

failure thresholds,
timeout values,
reset time,
half-open request count,
fallback strategy,
logging,
metrics,
alerting.

A circuit breaker without observability is not enough. You should know when it opens, why it opened, and whether recovery is working.

Rate Limiting and Throttling

Rate limiting controls how many requests a client can make within a specific time window.

It protects your API from:

abusive clients,
bots,
brute force attacks,
accidental request loops,
noisy tenants,
sudden traffic spikes,
expensive endpoints being overused.

Token bucket rate limiting

A rate limit usually has three parts:

Limit:      100 requests
Window:     1 minute
Identifier: user ID, API key, or IP address

When the client exceeds the limit, the API usually returns:

HTTP 429 Too Many Requests

A good API should also return helpful headers, such as Retry-After, so clients know when to try again.

Rate Limiting Algorithms

There are different ways to implement rate limiting.

Fixed Window

This is the simplest approach.

Example: allow 100 requests per minute.

It is easy to build, but it can allow bursts near the edge of the time window.

Sliding Window

A sliding window tracks requests more smoothly over time.

It is more accurate than fixed window, but it can be more expensive.

Token Bucket

A token bucket gives each client a bucket of tokens. Every request consumes a token. Tokens refill over time.

This allows short bursts while still enforcing an average rate.

For many APIs, token bucket is a very practical option.

Leaky Bucket

A leaky bucket processes requests at a steady rate. Extra requests may wait or be rejected.

This is useful when you want to smooth traffic before it reaches a dependency.

Rate Limiting Example in Express

Using express-rate-limit:

const express = require('express');
const rateLimit = require('express-rate-limit');

const app = express();

const apiLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 100,
  message: {
    error: 'Too many requests. Please try again later.',
  },
});

app.use('/api/', apiLimiter);

app.get('/api/data', (req, res) => {
  res.json({ message: 'ok' });
});

app.listen(3000);

This is a good starting point for basic APIs.

For production, be careful if your app is behind a proxy or load balancer. You need to make sure the limiter uses the real client IP, not just the proxy IP.

Rate Limiting Example in NestJS

NestJS provides @nestjs/throttler.

import { Controller, Get, UseGuards } from '@nestjs/common';
import { Throttle, ThrottlerGuard } from '@nestjs/throttler';

@UseGuards(ThrottlerGuard)
@Controller('messages')
export class MessagesController {
  @Get()
  @Throttle({ default: { limit: 10, ttl: 60000 } })
  getMessages() {
    return {
      data: [],
    };
  }
}

In real applications, not every endpoint should have the same limit.

A login endpoint, a search endpoint, and a file export endpoint have different risk levels. They should probably have different limits.

Rate Limiting Example in Go with Gin

Here is a simple in-memory token bucket example:

package main

import (
	"net/http"
	"sync"

	"github.com/gin-gonic/gin"
	"golang.org/x/time/rate"
)

var (
	visitors = make(map[string]*rate.Limiter)
	mu       sync.Mutex
)

func getLimiter(ip string) *rate.Limiter {
	mu.Lock()
	defer mu.Unlock()

	limiter, exists := visitors[ip]
	if exists {
		return limiter
	}

	limiter = rate.NewLimiter(5, 10)
	visitors[ip] = limiter

	return limiter
}

func rateLimiterMiddleware() gin.HandlerFunc {
	return func(c *gin.Context) {
		ip := c.ClientIP()
		limiter := getLimiter(ip)

		if !limiter.Allow() {
			c.AbortWithStatusJSON(http.StatusTooManyRequests, gin.H{
				"error": "Rate limit exceeded",
			})
			return
		}

		c.Next()
	}
}

func main() {
	r := gin.Default()
	r.Use(rateLimiterMiddleware())

	r.GET("/api/data", func(c *gin.Context) {
		c.JSON(http.StatusOK, gin.H{
			"message": "ok",
		})
	})

	r.Run(":8080")
}

This example is useful for learning, but for production with multiple instances, I would not rely only on in-memory maps.

You would usually use Redis, a shared rate-limiting service, or an API gateway.

Distributed Rate Limiting

If your API runs on one server, in-memory rate limiting can work.

But most production systems run multiple instances.

If each instance allows 100 requests per minute and you have 10 instances, your real limit may become 1,000 requests per minute.

That may be a big problem.

For distributed systems, better options include:

Redis-backed counters,
API gateway limits,
centralized rate-limiting services,
service mesh policies,
shared token bucket implementations.

Tools like Kong, Envoy, NGINX, AWS API Gateway, and similar platforms can enforce limits before traffic reaches your application.

That is usually a good thing, because rejecting bad traffic early saves backend resources.

Security Is Also Part of Resilience

Security and resilience are connected.

If someone can abuse your API, overload it, brute force it, inject bad input, or use leaked credentials, your availability is already at risk.

A resilient API should also be secure.

Use HTTPS

All API traffic should use HTTPS.

This protects tokens, credentials, user data, and request bodies from interception.

Use Strong Authentication and Authorization

Authentication answers:

Who are you?

Authorization answers:

What are you allowed to access?

You need both.

It is not enough to check if a user is logged in. You also need to check whether that user can access the specific resource they are requesting.

Validate Input

Never trust client input.

Validate:

request bodies,
query params,
headers,
path params,
file uploads,
webhook payloads.

In Node.js, I like libraries such as Zod, Joi, or Ajv. In NestJS, validation pipes and class-validator are very useful. In Go, request structs plus validation libraries can keep things clean.

Protect Secrets

Do not hard-code secrets.

Do not commit .env files.

Do not print tokens in logs.

Use tools like:

Vault,
AWS Secrets Manager,
Google Secret Manager,
Azure Key Vault,
environment variables injected securely at deploy time.

Also, rotate secrets and keep permissions limited.

Apply Least Privilege

Every service should have only the permissions it needs.

A reporting service probably does not need admin write access. A read-only API probably does not need delete permissions. A background worker should not have access to everything just because it is internal.

Least privilege reduces the blast radius when something goes wrong.

Testing Resilience

Resilience should be tested before production tests it for you.

And production always has a way of testing things at the worst time.

Unit Tests

Test the failure paths, not only the happy path.

Mock downstream services and simulate:

timeouts,
500 errors,
503 errors,
429 responses,
slow responses,
connection failures.

Then verify that your retries, timeouts, and fallbacks behave correctly.

Integration Tests

Integration tests help you see how services behave together.

For example:

What happens if the user service is slow?
Does the order service timeout correctly?
Does the circuit breaker open?
Does the API return a useful error?
Are retries limited?

Load Testing

Use tools like:

k6,
Artillery,
Locust,
JMeter.

Do not only test normal traffic. Test spikes, bursts, and heavy endpoints.

Good load testing helps answer:

Where does latency start increasing?
Which dependency fails first?
Do rate limits trigger correctly?
Do queues grow too much?
Does the API fail clearly or randomly?

Chaos Testing

Chaos testing means intentionally creating failure to see how the system reacts.

Examples:

kill a container,
add network latency,
drop packets,
restart a database node,
make a dependency return 500s,
fill a queue,
disable a cache.

This is not about breaking things for fun. It is about proving that the system can survive realistic failure.

Start in staging. Build confidence. Then gradually improve.

Observability: If You Cannot See It, You Cannot Fix It

Resilience without observability is guessing.

At minimum, I want metrics and logs for:

request latency,
error rates,
timeout count,
retry count,
circuit breaker state,
rate limit hits,
429 responses,
dependency health,
queue depth,
CPU and memory,
database connection usage.

Tracing is also very useful. With distributed tracing, you can follow one request across multiple services and see exactly where time was spent.

When an incident happens, good observability reduces panic. It helps the team move from “something is broken” to “this dependency is slow and this endpoint is affected.”

That difference matters.

Deployment and Operations Tips

A few practical ideas I like:

Use Existing Libraries

Do not rebuild everything from scratch unless you really need to.

For Node.js:

axios-retry
opossum
cockatiel
express-rate-limit
rate-limiter-flexible

For NestJS:

@nestjs/throttler
RxJS retry operators
custom interceptors
wrappers around Opossum or Cockatiel

For Go:

go-retryablehttp
sony/gobreaker
golang.org/x/time/rate
Redis-based rate limiters

Use API Gateways When It Makes Sense

API gateways and service meshes can handle many resilience concerns:

global rate limits,
authentication,
request size limits,
connection limits,
timeouts,
retries,
circuit breaking.

This can reduce repeated code across services.

But I still like keeping some resilience logic inside the application too, because the app understands business rules better than the gateway.

Make Values Configurable

During an incident, hard-coded settings are painful.

You may need to quickly change:

timeout values,
retry count,
circuit breaker thresholds,
rate limits,
feature flags,
fallback behavior.

A config system or feature flag platform can make this much easier.

Incident Response: What To Do When Things Are Already Breaking

When production is failing, the first goal is containment.

You need to stop the damage from spreading.

Possible actions:

reduce rate limits,
disable non-critical endpoints,
temporarily disable aggressive retries,
manually open a circuit if supported,
roll back a bad deployment,
scale services if the issue is real traffic,
pause background jobs,
slow queue consumers,
serve cached data,
disable expensive features behind a flag.

After the system stabilizes, then investigate the root cause.

Check:

logs,
metrics,
traces,
recent deployments,
config changes,
dependency health,
database performance,
traffic patterns.

And please write a postmortem.

Not to blame anyone, but to make the system better.

A good postmortem should answer:

What happened?
What was the impact?
Why did it happen?
How did we detect it?
What made recovery slower?
What will we change?

What Happens If We Ignore Resilience?

At first, maybe nothing.

That is the dangerous part.

The API works. Traffic is low. The team moves fast. Nobody wants to spend time on timeouts, fallback logic, or chaos testing.

Then the product grows.

Suddenly:

requests hang,
retries multiply traffic,
one service failure affects many services,
queues become huge,
users see errors,
dashboards are noisy,
nobody knows which limit to change,
incident response becomes stressful.

This is why I believe resilience should be added early, even in a simple form.

You do not need the perfect architecture from day one. But you should at least start with:

timeouts,
limited retries,
rate limits,
logging,
basic metrics.

Those small things can save a lot of pain later.

My Practical Roadmap for Resilient APIs

If I were improving an API step by step, I would follow this order.

1. Add Timeouts Everywhere

No dependency should be allowed to hang forever.

2. Add Limited Retries

Retry only transient failures. Use exponential backoff and jitter.

3. Protect Non-Idempotent Operations

Use idempotency keys for payments, orders, transfers, and similar operations.

4. Add Circuit Breakers

Wrap risky dependencies and fail fast when they are unhealthy.

5. Add Rate Limiting

Start with login, signup, public APIs, search, exports, and expensive endpoints.

6. Improve Security

Use HTTPS, strong auth, authorization, validation, secret management, and least privilege.

7. Add Observability

Track latency, errors, retries, circuit breaker states, and rate limit hits.

8. Test Failure Scenarios

Simulate slow services, failed services, traffic spikes, and retry storms.

9. Write Runbooks

During an incident, nobody wants to guess what to do.

10. Keep Tuning

Traffic changes. Products change. Dependencies change. Your resilience settings should evolve too.

Final Thoughts

Building resilient APIs is not about pretending failure will never happen.

It is about accepting that failure is part of the system and designing around it.

Retries help with temporary problems. Circuit breakers stop unhealthy services from hurting everything else. Rate limiting protects shared resources. Timeouts prevent requests from hanging forever. Testing and observability give you confidence. Security protects the system from abuse.

For me, the most important mindset is this:

Do not wait for production to teach you resilience the hard way.

Start simple. Add timeouts. Add careful retries. Add circuit breakers around risky calls. Add rate limits where abuse or overload can happen. Watch the system. Test it. Improve it.

A resilient API may still fail sometimes, but it fails clearly, recovers faster, and protects the rest of the application.

And in real production systems, that is a big win.

Useful References

If you want to go deeper, these topics are worth reading about:

Martin Fowler’s Circuit Breaker pattern
Microsoft Azure Retry and Circuit Breaker patterns
AWS architecture guidance for resilient systems
OWASP API Security Top 10
Envoy and Kong documentation for gateway-level resilience
Chaos engineering practices from Netflix and other production teams

Building Resilient APIs: Retries, Circuit Breakers, and Rate Limiting