Safety

Guardrails

Guardrails are rules that constrain model behavior.

Quick definition

Guardrails are rules that constrain model behavior.

They can restrict topics, formats, or tools for safety. In safety workflows, guardrails often shapes risk reduction.

Safety systems combine policy rules, classifiers, and human feedback to reduce harmful outputs.

Safety concepts reduce harmful outputs and protect users.

Block disallowed content categories.

Over-blocking can frustrate users while under-blocking increases risk. Balance safety with usability.

In BoltAI, this relates to safe outputs and content handling.