vllm.entrypoints.launcher ¶
_add_shutdown_handlers ¶
VLLM V1 AsyncLLM catches exceptions and returns only two types: EngineGenerateError and EngineDeadError.
EngineGenerateError is raised by the per request generate() method. This error could be request specific (and therefore recoverable - e.g. if there is an error in input processing).
EngineDeadError is raised by the background output_handler method. This error is global and therefore not recoverable.
We register these @app.exception_handlers to return nice responses to the end user if they occur and shut down if needed. See https://fastapi.tiangolo.com/tutorial/handling-errors/ for more details on how exception handlers work.
If an exception is encountered in a StreamingResponse generator, the exception is not raised, since we already sent a 200 status. Rather, we send an error message as the next chunk. Since the exception is not raised, this means that the server will not automatically shut down. Instead, we use the watchdog background task for check for errored state.
Source code in vllm/entrypoints/launcher.py
serve_http async ¶
serve_http(
app: FastAPI,
sock: socket | None,
enable_ssl_refresh: bool = False,
**uvicorn_kwargs: Any,
)
Start a FastAPI app using Uvicorn, with support for custom Uvicorn config options. Supports http header limits via h11_max_incomplete_event_size and h11_max_header_count.
Source code in vllm/entrypoints/launcher.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | |
terminate_if_errored ¶
terminate_if_errored(server: Server, engine: EngineClient)
See discussions here on shutting down a uvicorn server https://github.com/encode/uvicorn/discussions/1103 In this case we cannot await the server shutdown here because handler must first return to close the connection for this request.
Source code in vllm/entrypoints/launcher.py
watchdog_loop async ¶
watchdog_loop(server: Server, engine: EngineClient)