Skip to main content

Overview

Guardrails in Bifrost provide enterprise-grade content safety, security validation, and policy enforcement for LLM requests and responses. The system validates inputs and outputs in real-time against your specified policies, ensuring responsible AI deployment with comprehensive protection against harmful content, prompt injection, PII leakage, and policy violations.
Guardrails overview showing rules and profiles management

Core Concepts

Bifrost Guardrails are built around two core concepts that work together to provide flexible and powerful content protection:
ConceptDescription
RulesCustom policies defined using CEL (Common Expression Language) that determine what content to validate and when. Rules can apply to inputs, outputs, or both, and can be linked to one or more profiles for evaluation.
ProfilesConfigurations for external guardrail providers (AWS Bedrock, Azure Content Safety, GraySwan, Patronus AI). Profiles are reusable and can be shared across multiple rules.
How They Work Together:
  • Profiles define how content is evaluated using external provider capabilities
  • Rules define when and what content gets evaluated using CEL expressions
  • A single rule can use multiple profiles for layered protection
  • Profiles can be reused across different rules for consistency

Key Features

FeatureDescription
Multi-Provider SupportAWS Bedrock, Azure Content Safety, GraySwan, and Patronus AI integration
Dual-Stage ValidationGuard both inputs (prompts) and outputs (responses)
Real-Time ProcessingSynchronous and asynchronous validation modes
CEL-Based RulesDefine custom policies using Common Expression Language
Reusable ProfilesConfigure providers once, use across multiple rules
Sampling ControlApply rules to a percentage of requests for performance tuning
Automatic RemediationBlock, redact, or modify content based on policy
Comprehensive LoggingDetailed audit trails for compliance
Access Guardrails from the Bifrost dashboard:
PagePathDescription
ConfigurationGuardrails > ConfigurationManage guardrail rules and their settings
ProvidersGuardrails > ProvidersConfigure and manage guardrail profiles

Architecture

The following diagram illustrates how Rules and Profiles work together to validate LLM requests: Flow Description:
  1. Incoming Request - LLM request arrives at Bifrost
  2. Input Validation - Applicable rules evaluate the input using linked profiles
  3. LLM Processing - If input passes, request is forwarded to the LLM provider
  4. Output Validation - Response is evaluated by output rules using linked profiles
  5. Response - Validated response is returned (or blocked/modified based on violations)

Supported Guardrail Providers

Bifrost integrates with leading guardrail providers to offer comprehensive protection:

AWS Bedrock Guardrails

Amazon Bedrock Guardrails provides enterprise-grade content filtering and safety features with deep AWS integration. AWS Bedrock Guardrails configuration form Capabilities:
  • Content Filters: Hate speech, insults, sexual content, violence, misconduct
  • Denied Topics: Block specific topics or categories
  • Word Filters: Custom profanity and sensitive word blocking
  • PII Protection: Detect and redact 50+ PII entity types
  • Contextual Grounding: Verify responses against source documents
  • Prompt Attack Detection: Identify injection and jailbreak attempts
  • Image Content Support: Analyze images in addition to text (PNG, JPEG)
Configuration Fields:
FieldTypeRequiredDefaultDescription
access_keystringNo*-AWS Access Key ID
secret_keystringNo*-AWS Secret Access Key
bedrock_api_keystringNo*-Alternative Bedrock API key (Bearer token)
guardrail_arnstringYes-ARN of the Bedrock guardrail
guardrail_versionstringYes-Version of the guardrail (e.g., “1”, “DRAFT”)
regionstringYes-AWS region
*Either access_key + secret_key OR bedrock_api_key must be provided for authentication.
Authentication Methods:
Uses AWS SDK with static credentials:
{
  "access_key": "AKIAXXXXXXXXXXXXXXXXXX",
  "secret_key": "your-secret-access-key",
  "guardrail_arn": "arn:aws:bedrock:us-east-1:123456789:guardrail/abc123",
  "guardrail_version": "1",
  "region": "us-east-1"
}
Supported AWS Regions:
Region CodeRegion Name
us-east-1US East (N. Virginia)
us-east-2US East (Ohio)
us-west-1US West (N. California)
us-west-2US West (Oregon)
ap-south-1Asia Pacific (Mumbai)
ap-northeast-1Asia Pacific (Tokyo)
ap-northeast-2Asia Pacific (Seoul)
ap-southeast-1Asia Pacific (Singapore)
ap-southeast-2Asia Pacific (Sydney)
eu-central-1Europe (Frankfurt)
eu-west-1Europe (Ireland)
eu-west-2Europe (London)
eu-west-3Europe (Paris)
Supported Content Types:
  • Text content
  • Images (PNG, JPEG formats)
Usage Metrics Returned: Bedrock guardrails return detailed usage metrics for cost tracking and monitoring:
MetricDescription
content_policy_unitsUnits consumed by content policy evaluation
contextual_grounding_policy_unitsUnits for grounding checks
sensitive_information_policy_unitsUnits for PII detection
topic_policy_unitsUnits for topic filtering
word_policy_unitsUnits for word filtering
automated_reasoning_policy_unitsUnits for reasoning checks
content_policy_image_unitsUnits for image content analysis
Supported PII Types:
  • Personal identifiers (SSN, passport, driver’s license)
  • Financial information (credit cards, bank accounts)
  • Contact information (email, phone, address)
  • Medical information (health records, insurance)
  • Device identifiers (IP addresses, MAC addresses)

Azure Content Safety

Azure AI Content Safety provides multi-modal content moderation powered by Microsoft’s advanced AI models. Azure Content Safety configuration form Capabilities:
  • Severity-Based Filtering: 4-level severity classification (Safe, Low, Medium, High)
  • Multi-Category Detection: Hate, sexual, violence, self-harm content
  • Prompt Shield: Advanced jailbreak and injection detection
  • Indirect Attack Detection: Identify hidden malicious instructions
  • Protected Material: Detect copyrighted content (output only)
  • Custom Blocklists: Define organization-specific blocked terms
Configuration Fields:
FieldTypeRequiredDefaultDescription
endpointstringYes-Azure Content Safety endpoint URL
api_keystringYes-Azure subscription key
analyze_enabledbooleanNotrueEnable content analysis for Hate, Sexual, Violence, SelfHarm
analyze_severity_thresholdenumNo”medium”Severity level to trigger: low, medium, or high
jailbreak_shield_enabledbooleanNofalseEnable jailbreak detection (input only)
indirect_attack_shield_enabledbooleanNofalseEnable indirect prompt attack detection (input only)
copyright_enabledbooleanNofalseEnable copyrighted content detection (output only)
text_blocklist_enabledbooleanNofalseEnable custom blocklist filtering
blocklist_namesarrayNo-List of Azure blocklist names to apply
Severity Threshold Levels:
ThresholdNumeric ValueBehavior
low2Most strict - blocks severity 2 and above
medium4Balanced - blocks severity 4 and above
high6Least strict - blocks only severity 6
Detection Categories:
  • Hate and fairness
  • Sexual content
  • Violence
  • Self-harm
Input-only features: Jailbreak Shield and Indirect Attack Shield only apply to input validation. Output-only features: Copyright detection only applies to output validation.

Patronus AI

Patronus AI specializes in LLM security and safety with advanced evaluation capabilities. Capabilities:
  • Hallucination Detection: Identify factually incorrect responses
  • PII Detection: Comprehensive personal data identification
  • Toxicity Screening: Multi-language toxic content detection
  • Prompt Injection Defense: Advanced attack pattern recognition
  • Custom Evaluators: Build organization-specific safety checks
  • Real-Time Monitoring: Continuous safety validation
Advanced Features:
  • Context-aware evaluation
  • Multi-turn conversation analysis
  • Custom policy templates
  • Integration with existing safety workflows

GraySwan Cygnal

GraySwan Cygnal Monitor provides AI safety monitoring with natural language rule definitions and advanced threat detection capabilities. GraySwan configuration form Capabilities:
  • Violation Scoring: Continuous 0-1 scale violation detection with configurable thresholds
  • Custom Natural Language Rules: Define safety rules in plain English without code
  • Policy Management: Use pre-built policies from GraySwan platform or create custom ones
  • Indirect Prompt Injection (IPI) Detection: Identify hidden instructions in user inputs
  • Mutation Detection: Detect attempts to manipulate or alter content
  • Reasoning Modes: Choose from fast (“off”), balanced (“hybrid”), or thorough (“thinking”) analysis
Configuration Fields:
FieldTypeRequiredDefaultDescription
api_keystringYes-GraySwan API key
violation_thresholdnumberNo0.5Score threshold (0-1) for triggering intervention. Lower values are more strict.
reasoning_modeenumNo”off”Analysis depth: off (fastest), hybrid (balanced), or thinking (most thorough)
policy_idstringNo-Single custom policy ID from GraySwan platform
policy_idsarrayNo-Multiple policy IDs for aggregated rule evaluation
rulesobjectNo-Custom natural language rules as key-value pairs
Custom Rules Example: GraySwan custom rules Rules are defined as key-value pairs where the key is the rule name and the value is a natural language description:
{
  "rules": {
    "no_profanity": "Do not allow profanity or vulgar language",
    "no_pii": "Do not allow personally identifiable information",
    "professional_tone": "Ensure all responses maintain a professional tone"
  }
}
Detection Features:
  • Real-time violation scoring
  • Multi-rule evaluation
  • IPI attack detection
  • Content mutation monitoring
  • Detailed violation descriptions with rule attribution

Guardrail Rules

Guardrail Rules are custom policies that define when and how content validation occurs. Rules use CEL (Common Expression Language) expressions to evaluate requests and can be linked to one or more profiles for execution.
Guardrail rules list showing configured rules with status and actions

Rule Properties

PropertyTypeRequiredDescription
idintegerYesUnique identifier for the rule
namestringYesDescriptive name for the rule
descriptionstringNoExplanation of what the rule does
enabledbooleanYesWhether the rule is active
cel_expressionstringYesCEL expression for rule evaluation
apply_toenumYesWhen to apply: input, output, or both
sampling_rateintegerNoPercentage of requests to evaluate (0-100)
timeoutintegerNoExecution timeout in milliseconds
provider_config_idsarrayNoIDs of profiles to use for evaluation

Creating Rules

  1. Navigate to Rules
    • Go to Guardrails > Configuration
    • Click Add Rule
Guardrail rules list showing configured rules with status and actions
  1. Configure Rule Settings
Basic Information:
  • Name: Enter a descriptive name (e.g., “Block PII in Prompts”)
  • Description: Explain the rule’s purpose
  • Enabled: Toggle to activate the rule
Evaluation Settings:
  • Apply To: Select when to apply the rule
    • input - Validate incoming prompts only
    • output - Validate LLM responses only
    • both - Validate both inputs and outputs
  • CEL Expression: Define the validation logic
  • Sampling Rate: Set percentage of requests to evaluate (default: 100%)
  • Timeout: Set maximum execution time in milliseconds
  1. Link Profiles
    • Select one or more profiles to use for evaluation
    • Rules will execute all linked profiles in sequence
  2. Save and Test
    • Click Save Rule
    • Use the Test button to validate with sample content

CEL Expression Examples

CEL (Common Expression Language) provides a powerful way to define rule conditions. Here are common patterns: Always Apply Rule:
true
Apply to User Messages Only:
request.messages.exists(m, m.role == "user")
Apply to Messages Containing Keywords:
request.messages.exists(m, m.content.contains("confidential"))
Apply Based on Model:
request.model.startsWith("gpt-4")
Apply to Long Prompts:
request.messages.filter(m, m.role == "user").map(m, m.content.size()).sum() > 1000
Combine Multiple Conditions:
request.model.startsWith("gpt-4") && request.messages.exists(m, m.role == "user" && m.content.size() > 500)

Linking Rules to Profiles

Rules can be linked to multiple profiles for comprehensive validation:
Rule configuration showing linked profiles
Best Practices:
  • Link PII detection rules to profiles with PII capabilities (Bedrock, Patronus)
  • Link content filtering rules to profiles with content safety features (Azure, Bedrock, GraySwan)
  • Use GraySwan for custom natural language rules when you need flexible, readable policies
  • Use multiple profiles for defense-in-depth (e.g., Bedrock + Patronus for PII, Azure + GraySwan for content)
  • Set appropriate timeouts when using multiple profiles

Managing Profiles

Profiles are reusable configurations for external guardrail providers. Each profile contains provider-specific settings including credentials, endpoints, and detection thresholds.
Guardrail profiles list showing configured providers

Profile Properties

PropertyTypeRequiredDescription
idintegerYesUnique identifier for the profile
provider_namestringYesProvider type: bedrock, azure, grayswan, patronus_ai
policy_namestringYesDescriptive name for the policy
enabledbooleanYesWhether the profile is active
configobjectNoProvider-specific configuration

Creating Profiles

  1. Navigate to Providers
    • Go to Guardrails > Providers
    • Click Add Profile
Create guardrail profile form
  1. Select Provider Type
    • Choose from: AWS Bedrock, Azure Content Safety, GraySwan, or Patronus AI
  2. Configure Provider Settings
    • Enter credentials and endpoint information
    • Configure detection thresholds and actions
    • See provider-specific setup sections above for detailed configuration
  3. Save Profile
    • Click Save Profile
    • The profile is now available for linking to rules

Provider Capabilities

Each provider offers different capabilities. Choose profiles based on your validation needs:
CapabilityAWS BedrockAzure Content SafetyGraySwanPatronus AI
PII DetectionYesNoNoYes
Content FilteringYesYesYesYes
Prompt InjectionYesYesYesYes
Hallucination DetectionNoNoNoYes
Toxicity ScreeningYesYesYesYes
Custom PoliciesYesYesYesYes
Custom Natural Language RulesNoNoYesNo
Image SupportYesNoNoNo
IPI DetectionNoYesYesNo
Mutation DetectionNoNoYesNo

Best Practices

Profile Organization:
  • Create separate profiles for different use cases (PII, content filtering, etc.)
  • Use descriptive policy names that indicate the profile’s purpose
  • Keep credentials secure using environment variables
Performance Considerations:
  • Enable only the profiles you need to minimize latency
  • Use sampling rates on rules for high-traffic endpoints
  • Set appropriate timeouts to prevent slow requests
Security:
  • Store API keys and credentials in environment variables or secrets managers
  • Regularly rotate credentials
  • Use least-privilege IAM roles for AWS Bedrock

Using Guardrails in Requests

Attaching Guardrails to API Calls

Once configured, attach guardrails to your LLM requests using custom headers: Single Guardrail:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-guardrail-id: bedrock-prod-guardrail" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Help me with this task"
      }
    ]
  }'
Multiple Guardrails (Sequential):
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "x-bf-guardrail-ids: bedrock-prod-guardrail,azure-content-safety-001" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Help me with this task"
      }
    ]
  }'
Guardrail Configuration in Request:
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Help me with this task"
      }
    ],
    "bifrost_config": {
      "guardrails": {
        "input": ["bedrock-prod-guardrail"],
        "output": ["patronus-ai-001"],
        "async": false
      }
    }
  }'

Guardrail Response Handling

Successful Validation (200):
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699564800,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'd be happy to help you with your task..."
      },
      "finish_reason": "stop"
    }
  ],
  "extra_fields": {
    "guardrails": {
      "input_validation": {
        "guardrail_id": "bedrock-prod-guardrail",
        "status": "passed",
        "violations": [],
        "processing_time_ms": 245
      },
      "output_validation": {
        "guardrail_id": "patronus-ai-001",
        "status": "passed",
        "violations": [],
        "processing_time_ms": 312
      }
    }
  }
}
Validation Failure - Blocked (446):
{
  "error": {
    "message": "Request blocked by guardrails",
    "type": "guardrail_violation",
    "code": 446,
    "details": {
      "guardrail_id": "bedrock-prod-guardrail",
      "validation_stage": "input",
      "violations": [
        {
          "type": "PII",
          "category": "SSN",
          "severity": "HIGH",
          "action": "block",
          "text_excerpt": "My SSN is ***-**-****"
        },
        {
          "type": "prompt_injection",
          "severity": "CRITICAL",
          "action": "block",
          "confidence": 0.95
        }
      ],
      "processing_time_ms": 198
    }
  }
}
Validation Warning - Logged (246):
{
  "id": "chatcmpl-def456",
  "object": "chat.completion",
  "created": 1699564800,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response with redacted content..."
      },
      "finish_reason": "stop"
    }
  ],
  "bifrost_metadata": {
    "guardrails": {
      "output_validation": {
        "guardrail_id": "azure-content-safety-001",
        "status": "warning",
        "violations": [
          {
            "type": "profanity",
            "severity": "LOW",
            "action": "redact",
            "modifications": 2
          }
        ],
        "processing_time_ms": 187
      }
    }
  }
}