Safety

Moderation

Moderation detects and filters harmful or unsafe content.

Quick definition

Moderation detects and filters harmful or unsafe content.

It is often applied to both inputs and outputs. In safety workflows, moderation often shapes risk reduction.

Safety systems combine policy rules, classifiers, and human feedback to reduce harmful outputs.

Safety concepts reduce harmful outputs and protect users.

Flag prompts that request prohibited actions.

Over-blocking can frustrate users while under-blocking increases risk. Balance safety with usability.

In BoltAI, this relates to safe outputs and content handling.