Skip to main content
Bifrost integrates with Azure AI Content Safety to provide multi-modal content moderation powered by Microsoft’s advanced AI models. This page covers the configuration and capabilities of the Azure Content Safety guardrail provider. Azure Content Safety configuration form

Capabilities

  • Severity-Based Filtering: 4-level severity classification (Safe, Low, Medium, High)
  • Multi-Category Detection: Hate, sexual, violence, self-harm content
  • Prompt Shield: Advanced jailbreak and injection detection
  • Indirect Attack Detection: Identify hidden malicious instructions
  • Protected Material: Detect copyrighted content (output only)
  • Custom Blocklists: Define organization-specific blocked terms

Configuration Fields

FieldTypeRequiredDefaultDescription
endpointstringYes-Azure Content Safety endpoint URL
api_keystringYes-Azure subscription key
analyze_enabledbooleanNotrueEnable content analysis for Hate, Sexual, Violence, SelfHarm
analyze_severity_thresholdenumNo”medium”Severity level to trigger: low, medium, or high
jailbreak_shield_enabledbooleanNofalseEnable jailbreak detection (input only)
indirect_attack_shield_enabledbooleanNofalseEnable indirect prompt attack detection (input only)
copyright_enabledbooleanNofalseEnable copyrighted content detection (output only)
text_blocklist_enabledbooleanNofalseEnable custom blocklist filtering
blocklist_namesarrayNo-List of Azure blocklist names to apply

Severity Threshold Levels

ThresholdNumeric ValueBehavior
low2Most strict - blocks severity 2 and above
medium4Balanced - blocks severity 4 and above
high6Least strict - blocks only severity 6

Detection Categories

  • Hate and fairness
  • Sexual content
  • Violence
  • Self-harm
Input-only features: Jailbreak Shield and Indirect Attack Shield only apply to input validation. Output-only features: Copyright detection only applies to output validation.

Provider Capabilities Comparison

CapabilityAWS BedrockAzure Content SafetyGraySwanPatronus AI
PII DetectionYesNoNoYes
Content FilteringYesYesYesYes
Prompt InjectionYesYesYesYes
Hallucination DetectionNoNoNoYes
Toxicity ScreeningYesYesYesYes
Custom PoliciesYesYesYesYes
Custom Natural Language RulesNoNoYesNo
Image SupportYesNoNoNo
IPI DetectionNoYesYesNo
Mutation DetectionNoNoYesNo
For information on configuring guardrail rules and profiles, see Guardrails.