Skip to main content
Bifrost integrates with GraySwan Cygnal Monitor to provide AI safety monitoring with natural language rule definitions and advanced threat detection capabilities. This page covers the configuration and capabilities of the GraySwan Cygnal guardrail provider. GraySwan configuration form

Capabilities

  • Violation Scoring: Continuous 0-1 scale violation detection with configurable thresholds
  • Custom Natural Language Rules: Define safety rules in plain English without code
  • Policy Management: Use pre-built policies from GraySwan platform or create custom ones
  • Indirect Prompt Injection (IPI) Detection: Identify hidden instructions in user inputs
  • Mutation Detection: Detect attempts to manipulate or alter content
  • Reasoning Modes: Choose from fast (“off”), balanced (“hybrid”), or thorough (“thinking”) analysis

Configuration Fields

FieldTypeRequiredDefaultDescription
api_keystringYes-GraySwan API key
violation_thresholdnumberNo0.5Score threshold (0-1) for triggering intervention. Lower values are more strict.
reasoning_modeenumNo”off”Analysis depth: off (fastest), hybrid (balanced), or thinking (most thorough)
policy_idstringNo-Single custom policy ID from GraySwan platform
policy_idsarrayNo-Multiple policy IDs for aggregated rule evaluation
rulesobjectNo-Custom natural language rules as key-value pairs

Custom Rules Example

GraySwan custom rules Rules are defined as key-value pairs where the key is the rule name and the value is a natural language description:
{
  "rules": {
    "no_profanity": "Do not allow profanity or vulgar language",
    "no_pii": "Do not allow personally identifiable information",
    "professional_tone": "Ensure all responses maintain a professional tone"
  }
}

Detection Features

  • Real-time violation scoring
  • Multi-rule evaluation
  • IPI attack detection
  • Content mutation monitoring
  • Detailed violation descriptions with rule attribution

Provider Capabilities Comparison

CapabilityAWS BedrockAzure Content SafetyGraySwanPatronus AI
PII DetectionYesNoNoYes
Content FilteringYesYesYesYes
Prompt InjectionYesYesYesYes
Hallucination DetectionNoNoNoYes
Toxicity ScreeningYesYesYesYes
Custom PoliciesYesYesYesYes
Custom Natural Language RulesNoNoYesNo
Image SupportYesNoNoNo
IPI DetectionNoYesYesNo
Mutation DetectionNoNoYesNo
For information on configuring guardrail rules and profiles, see Guardrails.