Models

Sparse MoE

Sparse MoE activates only a subset of experts per token.

Quick definition

Sparse MoE activates only a subset of experts per token.

It reduces computation compared to dense models. In models workflows, sparse moe often shapes model capability and fit.

Model architecture and scale determine capability. Context length, parameter count, and modality support vary across models.

Model architecture affects capability, context length, and speed.

Use 2 experts out of 16 for each token.

Bigger is not always better. Match the model to the task and evaluate in production.

In BoltAI, this shows up in model selection and configuration.