A high-throughput and memory-efficient inference and serving engine for LLMs
conda install anaconda::vllm
vLLM is a fast and easy-to-use library for LLM inference and serving.