Health Checks

Health checks allow NGINX to determine whether a backend server is healthy or unhealthy and decide whether to send traffic to it.

Goal:

If backend is unhealthy → stop sending requests to it
If backend recovers → resume traffic

This ensures:

High availability
Fault tolerance
Better user experience

Types of Health Checks in NGINX

NGINX supports two kinds of health checks:

Type	Availability
Passive health checks	✅ NGINX Open Source
Active health checks	❌ Open Source (✅ NGINX Plus)

Passive Health Checks (Open Source NGINX)

NGINX does not actively probe backends.

Instead, it:

Sends real client requests
Monitors backend responses
Marks a server as failed if errors occur

Failure conditions include:

Connection timeout
Connection refused
Invalid response
HTTP 500 / 502 / 503 / 504

Core Directives for Passive Health Checks

These are defined inside the upstream block.

`max_fails`

server backend1 max_fails=3;

Number of failed attempts before marking server unhealthy

`fail_timeout`

server backend1 fail_timeout=30s;

Time window for counting failures
Also the time server is considered down

Combined Example

upstream app_backend {
    server 10.0.0.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.12:8080 max_fails=3 fail_timeout=30s;
}

If a server fails 3 times within 30 seconds
NGINX marks it unavailable
Traffic is sent only to healthy servers
After 30 seconds, NGINX retries it

Full Passive Health Check Example

upstream api_backend {
    least_conn;

    server 10.0.0.11:8080 max_fails=3 fail_timeout=20s;
    server 10.0.0.12:8080 max_fails=3 fail_timeout=20s;
}

server {
    listen 80;

    location /api/ {
        proxy_pass http://api_backend;

        proxy_connect_timeout 3s;
        proxy_read_timeout 10s;
    }
}

Request Flow Explanation

Client sends request to NGINX
NGINX proxies request to backend
If backend:
- Times out
- Refuses connection
- Returns 5xx repeatedly
NGINX increments failure counter
Once threshold reached → backend is skipped
After fail_timeout, backend is retried

What Happens When All Backends Fail?

If all servers in the upstream are marked down:

NGINX temporarily retries failed servers
If still unavailable → client receives 502 Bad Gateway

Passive Health Check Failure Conditions (Important)

NGINX counts failures when:

Condition	Counts as Failure
TCP connection refused	✅
Timeout	✅
No response	✅
HTTP 500 / 502 / 503 / 504	✅
HTTP 404	❌
HTTP 401	❌

`backup` Servers (Failover Strategy)

upstream app_backend {
   server 10.0.0.11:8080;
   server 10.0.0.12:8080;
   server 10.0.0.99:8080 backup;
}

Backup server is used only if all primary servers fail
Useful for DR or reduced-capacity nodes

Temporarily Disable a Backend

   server 10.0.0.11:8080 down;

Server is manually removed
Useful for maintenance
Requires reload to re-enable

Active Health Checks (NGINX Plus Only)

Not available in open-source NGINX

How Active Health Checks Work

NGINX sends periodic health probe requests
Uses a dedicated endpoint (e.g. /health)
Removes backend before user traffic fails

health_check uri=/health interval=5s fails=2 passes=1;

Passive vs Active Health Checks

Feature	Passive	Active
Available in OSS	✅	❌
Sends probes	❌	✅
Detects failures early	❌	✅
Uses real traffic	✅	❌
Complexity	Low	Medium

Real-World Production Example

upstream web_backend {
least_conn;

    server 10.0.1.10:8080 max_fails=2 fail_timeout=15s;
    server 10.0.1.11:8080 max_fails=2 fail_timeout=15s;
    server 10.0.1.99:8080 backup;

}

server {
listen 80;

    location / {
        proxy_pass http://web_backend;
        proxy_connect_timeout 2s;
        proxy_read_timeout 30s;
    }

}

Summary

Health checks prevent traffic to failed backends
Open-source NGINX supports passive health checks
Key directives:
- max_fails
- fail_timeout
- backup
Active health checks require NGINX Plus
Proper tuning is critical for reliability

Types of Health Checks in NGINX​

Passive Health Checks (Open Source NGINX)​

Core Directives for Passive Health Checks​

max_fails​

fail_timeout​

Combined Example​

Full Passive Health Check Example​

Request Flow Explanation​

What Happens When All Backends Fail?​

Passive Health Check Failure Conditions (Important)​

backup Servers (Failover Strategy)​

Temporarily Disable a Backend​

Active Health Checks (NGINX Plus Only)​

Passive vs Active Health Checks​

Real-World Production Example​

Summary​