Safety
Bias
Bias is systematic unfairness or skew in model outputs.
Quick definition
Bias is systematic unfairness or skew in model outputs.
- Category: Safety
- Focus: risk reduction
- Used in: Filtering sensitive or unsafe requests.
What it means
It can arise from data imbalance or training choices. In safety workflows, bias often shapes risk reduction.
How it works
Safety systems combine policy rules, classifiers, and human feedback to reduce harmful outputs.
Why it matters
Safety concepts reduce harmful outputs and protect users.
Common use cases
- Filtering sensitive or unsafe requests.
- Adding guardrails around tools and actions.
- Redacting private information in logs and outputs.
Example
Unequal performance across demographic groups.
Pitfalls and tips
Over-blocking can frustrate users while under-blocking increases risk. Balance safety with usability.
In BoltAI
In BoltAI, this relates to safe outputs and content handling.