Models

Inference

Inference is running a trained model to produce outputs.

Quick definition

Inference is running a trained model to produce outputs.

It is the production phase after training. In models workflows, inference often shapes model capability and fit.

Model architecture and scale determine capability. Context length, parameter count, and modality support vary across models.

Model architecture affects capability, context length, and speed.

Generating a summary from a document.

Bigger is not always better. Match the model to the task and evaluate in production.

In BoltAI, this shows up in model selection and configuration.