Operations

Latency

Latency is the time between request and response.

Quick definition

Latency is the time between request and response.

Lower latency improves user experience in chat interfaces. In operations workflows, latency often shapes performance and reliability.

Operations covers latency, throughput, and cost. Systems often use caching, batching, and monitoring to scale reliably.

Operational choices impact cost, latency, and reliability.

Aim for under 1 second time-to-first-token.

Ignoring limits can cause timeouts or rate limiting. Set budgets and monitor usage to avoid surprises.

In BoltAI, this shows up in performance, logging, or usage views.