Azure Content Safety

Bifrost integrates with Azure AI Content Safety to provide multi-modal content moderation powered by Microsoft’s advanced AI models. This page covers the configuration and capabilities of the Azure Content Safety guardrail provider.

Capabilities

Severity-Based Filtering: 4-level severity classification (Safe, Low, Medium, High)
Multi-Category Detection: Hate, sexual, violence, self-harm content
Prompt Shield: Advanced jailbreak and injection detection
Indirect Attack Detection: Identify hidden malicious instructions
Protected Material: Detect copyrighted content (output only)
Custom Blocklists: Define organization-specific blocked terms

Configuration Fields

Field	Type	Required	Default	Description
`endpoint`	string	Yes	-	Azure Content Safety endpoint URL
`api_key`	string	Yes	-	Azure subscription key
`analyze_enabled`	boolean	No	true	Enable content analysis for Hate, Sexual, Violence, SelfHarm
`analyze_severity_threshold`	enum	No	”medium”	Severity level to trigger: `low`, `medium`, or `high`
`jailbreak_shield_enabled`	boolean	No	false	Enable jailbreak detection (input only)
`indirect_attack_shield_enabled`	boolean	No	false	Enable indirect prompt attack detection (input only)
`copyright_enabled`	boolean	No	false	Enable copyrighted content detection (output only)
`text_blocklist_enabled`	boolean	No	false	Enable custom blocklist filtering
`blocklist_names`	array	No	-	List of Azure blocklist names to apply

Collecting your API key and URL

Navigate to Azure foundry dashboard

Copy API key to use it in the Azure content moderation config form
Copy project endpoint and use base URL as endpoint in the form. e.g. (https://xxx-resource.services.ai.azure.com)

Severity Threshold Levels

Threshold	Numeric Value	Behavior
`low`	2	Most strict - blocks severity 2 and above
`medium`	4	Balanced - blocks severity 4 and above
`high`	6	Least strict - blocks only severity 6

Detection Categories

Hate and fairness
Sexual content
Violence
Self-harm

Input-only features: Jailbreak Shield and Indirect Attack Shield only apply to input validation. Output-only features: Copyright detection only applies to output validation.

For provider comparison and information on configuring guardrail rules and profiles, see Guardrails.

​Capabilities

​Configuration Fields

​Collecting your API key and URL

​Severity Threshold Levels

​Detection Categories

Capabilities

Configuration Fields

Collecting your API key and URL

Severity Threshold Levels

Detection Categories