vllm.entrypoints.serve.instrumentator.health ¶
health async ¶
Health check. Returns 503 when paused or dead.
Note: During drain shutdown, middleware returns 503 before reaching here. Designed to be used as the readiness probe in a Kubernetes deployment.
Source code in vllm/entrypoints/serve/instrumentator/health.py
live async ¶
Liveness check. Returns 200 when draining, 503 only when dead.