Models

KV cache

KV cache stores key and value tensors for previous tokens.

Quick definition

KV cache stores key and value tensors for previous tokens.

It speeds up generation by avoiding recomputation. In models workflows, kv cache often shapes model capability and fit.

Model architecture and scale determine capability. Context length, parameter count, and modality support vary across models.

Model architecture affects capability, context length, and speed.

Long chats generate faster with a cache.

Bigger is not always better. Match the model to the task and evaluate in production.

In BoltAI, this shows up in model selection and configuration.