vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
A high-throughput and memory-efficient inference and serving engine for LLMs
To install this package, run one of the following:
vLLM is a fast and easy-to-use library for LLM inference and serving.
Summary
A high-throughput and memory-efficient inference and serving engine for LLMs
Last Updated
Apr 27, 2026 at 19:20
License
Apache-2.0
Supported Platforms
Home
https://vllm.ai/GitHub Repository
https://github.com/vllm-project/vllmDocumentation
https://docs.vllm.ai/en/latest/usage/