To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
https://anaconda.org/conda-forge/llmlingua/badges/version.svg
https://anaconda.org/conda-forge/llmlingua/badges/latest_release_date.svg
https://anaconda.org/conda-forge/llmlingua/badges/latest_release_relative_date.svg
https://anaconda.org/conda-forge/llmlingua/badges/platforms.svg
https://anaconda.org/conda-forge/llmlingua/badges/license.svg
https://anaconda.org/conda-forge/llmlingua/badges/downloads.svg