Evaluation

Perplexity

Perplexity measures how well a model predicts text.

Quick definition

Perplexity measures how well a model predicts text.

  • Category: Evaluation
  • Focus: quality measurement
  • Used in: Comparing models or prompt variants.

What it means

Lower perplexity usually indicates better predictive fit. In evaluation workflows, perplexity often shapes quality measurement.

How it works

Evaluation uses tests and benchmarks to measure quality and catch regressions.

Why it matters

Evaluation ensures you can measure and improve quality over time.

Common use cases

  • Comparing models or prompt variants.
  • Tracking accuracy over time with regression tests.
  • Validating that outputs meet acceptance criteria.

Example

Compare perplexity across model checkpoints.

Pitfalls and tips

Overfitting to a single benchmark can mislead. Use varied tests and real-world examples.

In BoltAI

In BoltAI, this appears when measuring or comparing results.