llmlingua
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
| Name | Type | Version | Platform | Labels | Updated | Size | Downloads | Actions |
|---|
noarch/llmlingua-0.2.2-pyhd8ed1ab_1.conda | conda | 0.2.2 | noarch | main | Jan 5, 2025, 11:26 AM | 32.5 KB | 782 | |
noarch/llmlingua-0.2.2-pyhd8ed1ab_0.conda | conda | 0.2.2 | noarch | main | Apr 9, 2024, 09:34 AM | 32.67 KB | 1.4K | |
noarch/llmlingua-0.2.1-pyhd8ed1ab_1.conda | conda | 0.2.1 | noarch | main | Mar 25, 2024, 01:59 AM | 31.44 KB | 1.3K | |
noarch/llmlingua-0.2.1-pyhd8ed1ab_0.conda | conda | 0.2.1 | noarch | main | Mar 24, 2024, 04:35 PM | 31.4 KB | 1.2K |