vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
To install this package, run one of the following:
Easy, fast, and cheap LLM serving for everyone
Summary
A high-throughput and memory-efficient inference and serving engine for LLMs
Last Updated
Sep 26, 2025 at 08:29
License
Apache-2.0 AND BSD-3-Clause
Total Downloads
13.7K
Supported Platforms
Unsupported Platforms
Documentation
https://vllm.readthedocs.io/en/latest/