Health Checks
Health checks allow NGINX to determine whether a backend server is healthy or unhealthy and decide whether to send traffic to it.
Goal:
If backend is unhealthy → stop sending requests to it
If backend recovers → resume traffic
This ensures:
- High availability
- Fault tolerance
- Better user experience
Types of Health Checks in NGINX
NGINX supports two kinds of health checks:
| Type | Availability |
|---|---|
| Passive health checks | ✅ NGINX Open Source |
| Active health checks | ❌ Open Source (✅ NGINX Plus) |
Passive Health Checks (Open Source NGINX)
NGINX does not actively probe backends.
Instead, it:
- Sends real client requests
- Monitors backend responses
- Marks a server as failed if errors occur
Failure conditions include:
- Connection timeout
- Connection refused
- Invalid response
- HTTP 500 / 502 / 503 / 504
Core Directives for Passive Health Checks
These are defined inside the upstream block.
max_fails
server backend1 max_fails=3;
Number of failed attempts before marking server unhealthy
fail_timeout
server backend1 fail_timeout=30s;
- Time window for counting failures
- Also the time server is considered down
Combined Example
upstream app_backend {
server 10.0.0.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.0.12:8080 max_fails=3 fail_timeout=30s;
}
- If a server fails 3 times within 30 seconds
- NGINX marks it unavailable
- Traffic is sent only to healthy servers
- After 30 seconds, NGINX retries it
Full Passive Health Check Example
upstream api_backend {
least_conn;
server 10.0.0.11:8080 max_fails=3 fail_timeout=20s;
server 10.0.0.12:8080 max_fails=3 fail_timeout=20s;
}
server {
listen 80;
location /api/ {
proxy_pass http://api_backend;
proxy_connect_timeout 3s;
proxy_read_timeout 10s;
}
}
Request Flow Explanation
- Client sends request to NGINX
- NGINX proxies request to backend
- If backend:
- Times out
- Refuses connection
- Returns 5xx repeatedly
- NGINX increments failure counter
- Once threshold reached → backend is skipped
- After
fail_timeout, backend is retried
What Happens When All Backends Fail?
If all servers in the upstream are marked down:
- NGINX temporarily retries failed servers
- If still unavailable → client receives 502 Bad Gateway
Passive Health Check Failure Conditions (Important)
NGINX counts failures when:
| Condition | Counts as Failure |
|---|---|
| TCP connection refused | ✅ |
| Timeout | ✅ |
| No response | ✅ |
| HTTP 500 / 502 / 503 / 504 | ✅ |
| HTTP 404 | ❌ |
| HTTP 401 | ❌ |
backup Servers (Failover Strategy)
upstream app_backend {
server 10.0.0.11:8080;
server 10.0.0.12:8080;
server 10.0.0.99:8080 backup;
}
- Backup server is used only if all primary servers fail
- Useful for DR or reduced-capacity nodes
Temporarily Disable a Backend
server 10.0.0.11:8080 down;
- Server is manually removed
- Useful for maintenance
- Requires reload to re-enable
Active Health Checks (NGINX Plus Only)
Not available in open-source NGINX
How Active Health Checks Work
- NGINX sends periodic health probe requests
- Uses a dedicated endpoint (e.g. /health)
- Removes backend before user traffic fails
health_check uri=/health interval=5s fails=2 passes=1;
Passive vs Active Health Checks
| Feature | Passive | Active |
|---|---|---|
| Available in OSS | ✅ | ❌ |
| Sends probes | ❌ | ✅ |
| Detects failures early | ❌ | ✅ |
| Uses real traffic | ✅ | ❌ |
| Complexity | Low | Medium |
Real-World Production Example
upstream web_backend {
least_conn;
server 10.0.1.10:8080 max_fails=2 fail_timeout=15s;
server 10.0.1.11:8080 max_fails=2 fail_timeout=15s;
server 10.0.1.99:8080 backup;
}
server {
listen 80;
location / {
proxy_pass http://web_backend;
proxy_connect_timeout 2s;
proxy_read_timeout 30s;
}
}
Summary
- Health checks prevent traffic to failed backends
- Open-source NGINX supports passive health checks
- Key directives:
max_failsfail_timeoutbackup
- Active health checks require NGINX Plus
- Proper tuning is critical for reliability