Datadog - Bifrost

Overview

The Datadog plugin provides native integration with the Datadog observability platform, offering three pillars of observability for your LLM operations:

APM Traces - Distributed tracing via dd-trace-go v2 with W3C Trace Context support for end-to-end request visibility
LLM Observability - Native Datadog LLM Obs integration for AI/ML-specific monitoring
Metrics - Operational metrics via DogStatsD or the Metrics API

Unlike the OTel plugin which sends generic OpenTelemetry data, the Datadog plugin leverages Datadog’s native SDKs for richer integration with Datadog-specific features like LLM Observability dashboards and ML App grouping.

Deployment Modes

The plugin supports two deployment modes:

Mode	Description	Requirements	Best For
Agent (default)	Sends data through a local Datadog Agent	Datadog Agent running on host	Production deployments with existing agent infrastructure
Agentless	Sends data directly to Datadog APIs	API key only	Serverless, containers, or simplified deployments

Agent Mode

In agent mode, the plugin communicates with a locally running Datadog Agent:

APM Traces → Agent at localhost:8126
Metrics → DogStatsD at localhost:8125

The agent handles batching, retries, and provides lower latency. This is the recommended mode for production deployments where you already have the Datadog Agent installed.

Agentless Mode

In agentless mode, the plugin sends data directly to Datadog’s intake APIs:

APM Traces → https://trace.agent.{site}
LLM Observability → Direct API submission
Metrics → Datadog Metrics API

This mode requires an API key but simplifies deployment by eliminating the need for a local agent. Ideal for serverless environments, Kubernetes pods, or quick testing.

Configuration

Required Fields

Field	Type	Required	Default	Description
`service_name`	`string`	No	`bifrost`	Service name displayed in Datadog APM
`ml_app`	`string`	No	(uses `service_name`)	ML application name for LLM Observability grouping
`agent_addr`	`string`	No	`localhost:8126`	Datadog Agent address (agent mode only)
`dogstatsd_addr`	`string`	No	`localhost:8125`	DogStatsD server address (agent mode only)
`env`	`string`	No	-	Environment tag (e.g., `production`, `staging`)
`version`	`string`	No	-	Service version tag
`custom_tags`	`object`	No	-	Additional tags for all traces and metrics
`enable_metrics`	`bool`	No	`true`	Enable metrics emission
`enable_traces`	`bool`	No	`true`	Enable APM traces
`enable_llm_obs`	`bool`	No	`true`	Enable LLM Observability
`agentless`	`bool`	No	`false`	Use agentless mode (direct API)
`api_key`	`EnvVar`	Agentless only	-	Datadog API key (supports `env.VAR_NAME`)
`site`	`string`	No	`datadoghq.com`	Datadog site/region

Environment Variable Substitution

The api_key and custom_tags fields support environment variable substitution using the env. prefix:

{
  "api_key": "env.DD_API_KEY",
  "custom_tags": {
    "team": "env.TEAM_NAME",
    "cost_center": "env.COST_CENTER"
  }
}

Setup

UI
Go SDK
config.json

Configure the Datadog plugin through the Bifrost UI:

Navigate to Settings → Plugins
Enable the Datadog plugin
Configure the required fields based on your deployment mode

package main

import (
    "context"
    bifrost "github.com/maximhq/bifrost/core"
    "github.com/maximhq/bifrost/core/schemas"
    "github.com/maximhq/bifrost/framework/modelcatalog"
    datadog "github.com/maximhq/bifrost-enterprise/plugins/datadog"
)

func main() {
    ctx := context.Background()
    logger := schemas.NewLogger()
    
    // Initialize model catalog (required for cost calculation)
    modelCatalog := modelcatalog.NewModelCatalog(logger)
    
    // Agent mode configuration
    ddPlugin, err := datadog.Init(ctx, &datadog.Config{
        ServiceName: "my-llm-service",
        Env:         "production",
        Version:     "1.0.0",
        CustomTags: map[string]string{
            "team": "platform",
        },
    }, logger, modelCatalog, "1.0.0")
    if err != nil {
        panic(err)
    }
    
    // Initialize Bifrost with the plugin
    client, err := bifrost.Init(ctx, schemas.BifrostConfig{
        Account: &yourAccount,
        Plugins: []schemas.Plugin{ddPlugin},
    })
    if err != nil {
        panic(err)
    }
    defer client.Shutdown()
    
    // All requests are now traced to Datadog
}

For agentless mode:

// Agentless mode configuration
enableAgentless := true
ddPlugin, err := datadog.Init(ctx, &datadog.Config{
    ServiceName: "my-llm-service",
    Env:         "production",
    Agentless:   &enableAgentless,
    APIKey:      &schemas.EnvVar{EnvVarName: "DD_API_KEY"},
    Site:        "datadoghq.com",
}, logger, modelCatalog, "1.0.0")

Agent Mode (Minimal)

{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "bifrost",
        "env": "production"
      }
    }
  ]
}

Agent Mode (Full Configuration)

{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "my-llm-gateway",
        "ml_app": "my-ml-application",
        "agent_addr": "localhost:8126",
        "dogstatsd_addr": "localhost:8125",
        "env": "production",
        "version": "1.2.3",
        "custom_tags": {
          "team": "platform",
          "cost_center": "env.COST_CENTER"
        },
        "enable_metrics": true,
        "enable_traces": true,
        "enable_llm_obs": true
      }
    }
  ]
}

Agentless Mode

{
  "plugins": [
    {
      "enabled": true,
      "name": "datadog",
      "config": {
        "service_name": "my-llm-gateway",
        "env": "production",
        "agentless": true,
        "api_key": "env.DD_API_KEY",
        "site": "datadoghq.com"
      }
    }
  ]
}

Set the environment variable:

export DD_API_KEY="your-datadog-api-key"

Datadog Sites

The plugin supports all Datadog regional sites. Set the site field to match your Datadog account region:

Site	Region	Value
US1 (default)	United States	`datadoghq.com`
US3	United States	`us3.datadoghq.com`
US5	United States	`us5.datadoghq.com`
EU1	Europe	`datadoghq.eu`
AP1	Asia Pacific (Japan)	`ap1.datadoghq.com`
AP2	Asia Pacific (Australia)	`ap2.datadoghq.com`
US1-FED	US Government	`ddog-gov.com`

Ensure your API key corresponds to the selected site. API keys from one region will not work with another.

LLM Observability

The Datadog plugin integrates with Datadog LLM Observability to provide AI/ML-specific monitoring capabilities.

ML App Grouping

LLM traces are grouped under an ML App in Datadog. By default, this uses your service_name, but you can specify a dedicated ML App name:

{
  "service_name": "bifrost-gateway",
  "ml_app": "customer-support-ai"
}

This allows you to:

Group related LLM operations across multiple services
Track costs and performance by application
Apply ML-specific alerts and dashboards

Session Tracking

The plugin supports session tracking via the x-bf-session-id header. Include this header in your requests to group related LLM calls into a conversation session:

curl -X POST https://your-bifrost-gateway/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "x-bf-session-id: user-123-session-456" \
  -d '{...}'

Sessions appear in Datadog LLM Observability, allowing you to trace entire conversation flows.

W3C Distributed Tracing

The plugin supports W3C Trace Context for distributed tracing across services. When your upstream service sends a traceparent header, Bifrost automatically links its spans as children of the parent trace.

curl -X POST https://your-bifrost-gateway/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
  -d '{...}'

This enables:

End-to-end visibility - See LLM calls in the context of your full application trace
Cross-service correlation - Link frontend requests → backend services → Bifrost → LLM providers
Latency attribution - Understand how LLM latency contributes to overall request time

The traceparent header format follows the W3C standard:

traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}

All Datadog APM spans created by Bifrost will be linked to the parent span, appearing as children in the Datadog trace view.

What’s Captured

For each LLM operation, the plugin sends to LLM Observability:

Input/Output Messages - Full conversation history with role attribution
Token Usage - Input, output, and total token counts
Cost - Calculated cost in USD based on model pricing
Latency - Request duration and time-to-first-token for streaming
Model Info - Provider, model name, and request parameters
Tool Calls - Function/tool call details for agentic workflows

Metrics Reference

The plugin emits the following metrics to Datadog:

Metric	Type	Description	Tags
`bifrost.requests.total`	Counter	Total LLM requests	provider, model, request_type
`bifrost.success.total`	Counter	Successful requests	provider, model, request_type
`bifrost.errors.total`	Counter	Failed requests	provider, model, request_type, reason
`bifrost.latency.seconds`	Histogram	Request latency distribution	provider, model, request_type
`bifrost.tokens.input`	Counter	Input/prompt tokens consumed	provider, model
`bifrost.tokens.output`	Counter	Output/completion tokens generated	provider, model
`bifrost.tokens.total`	Counter	Total tokens (input + output)	provider, model
`bifrost.cost.usd`	Gauge	Request cost in USD	provider, model
`bifrost.cache.hits`	Counter	Cache hits	provider, model, cache_type
`bifrost.stream.first_token_latency`	Histogram	Time to first token (streaming)	provider, model
`bifrost.stream.inter_token_latency`	Histogram	Inter-token latency (streaming)	provider, model

Custom Tags

All metrics include your configured custom_tags plus automatic tags for:

provider - LLM provider (openai, anthropic, etc.)
model - Model name
request_type - Type of request (chat, embedding, etc.)
env - Environment from configuration

Captured Data

Each APM trace includes comprehensive LLM operation metadata:

Span Attributes

Span Name - Based on request type (genai.chat, genai.embedding, etc.)
Service Info - service.name, service.version, env
Provider & Model - gen_ai.provider.name, gen_ai.request.model

Request Parameters

Temperature, max_tokens, top_p, stop sequences
Presence/frequency penalties
Tool configurations and parallel tool calls
Custom parameters via ExtraParams

Input/Output Data

Complete chat history with role-based messages
Prompt text for completions
Response content with role attribution
Tool calls and results
Reasoning and refusal content (when present)

Performance Metrics

Token usage (prompt, completion, total)
Cost calculations in USD
Latency and timing (start/end timestamps)
Time to first token (streaming)
Error details with status codes

Bifrost Context

Virtual key ID and name
Selected key ID and name
Team ID and name
Customer ID and name
Retry count and fallback index

Supported Request Types

The Datadog plugin captures all Bifrost request types:

Request Type	Span Name	LLM Obs Type
Chat Completion	`genai.chat`	LLM Span
Chat Completion (streaming)	`genai.chat`	LLM Span
Text Completion	`genai.text`	LLM Span
Text Completion (streaming)	`genai.text`	LLM Span
Embeddings	`genai.embedding`	Embedding Span
Speech Generation	`genai.speech`	Task Span
Speech Generation (streaming)	`genai.speech`	Task Span
Transcription	`genai.transcription`	Task Span
Transcription (streaming)	`genai.transcription`	Task Span
Responses API	`genai.responses`	LLM Span
Responses API (streaming)	`genai.responses`	LLM Span

When to Use

Datadog Plugin

Choose the Datadog plugin when you:

Use Datadog as your primary observability platform
Want native LLM Observability integration with ML App grouping
Need seamless correlation with existing Datadog APM traces via W3C distributed tracing
Require Datadog-specific features like notebooks and dashboards
Want session tracking for conversation flows

vs. OTel Plugin

Use the OTel plugin when you:

Need multi-vendor observability (send to multiple backends)
Are using Datadog via an OpenTelemetry Collector
Want vendor flexibility to switch backends without code changes
Prefer standardized OpenTelemetry semantic conventions

You can use both plugins simultaneously if needed. The Datadog plugin provides native integration while OTel can send to additional backends.

vs. Built-in Observability

Use Built-in Observability for:

Local development and testing
Simple self-hosted deployments
No external dependencies required
Direct database access to logs

Troubleshooting

Agent Connectivity Issues

Verify the Datadog Agent is running and accessible:

# Check agent status
datadog-agent status

# Test APM endpoint
curl -v http://localhost:8126/info

# Test DogStatsD (should accept UDP packets)
echo "test.metric:1|c" | nc -u -w1 localhost 8125

Agentless Mode Not Working

Verify your API key is valid:

curl -X GET "https://api.datadoghq.com/api/v1/validate" \
  -H "DD-API-KEY: $DD_API_KEY"

Ensure the site matches your API key’s region
Check that the API key environment variable is set:

echo $DD_API_KEY

Missing Traces

Enable debug logging in Bifrost:

bifrost-http --log-level debug

Verify traces are enabled in your configuration:

{
  "enable_traces": true,
  "enable_llm_obs": true
}

Check for errors in the Bifrost logs related to the Datadog plugin

Missing Metrics

Verify DogStatsD is running (agent mode):

datadog-agent status | grep DogStatsD

Ensure metrics are enabled:

{
  "enable_metrics": true
}

For agentless mode, verify your API key has metrics submission permissions

LLM Observability Not Appearing

LLM Observability requires enable_llm_obs: true (default)
Verify your Datadog plan includes LLM Observability
Check the ML App name in Datadog under LLM Observability → Applications

Next Steps

OTel Plugin - OpenTelemetry integration for multi-vendor observability
Built-in Observability - Local logging for development
Telemetry - Prometheus metrics and dashboards

Overview

Quick Start

Providers & Guides

SDK Integrations

MCP Gateway

Custom plugins

Open Source Features

Enterprise Features

​Overview

​Deployment Modes

​Agent Mode

​Agentless Mode

​Configuration

​Required Fields

​Environment Variable Substitution

​Setup

​Agent Mode (Minimal)

​Agent Mode (Full Configuration)

​Agentless Mode

​Datadog Sites

​LLM Observability

​ML App Grouping

​Session Tracking

​W3C Distributed Tracing

​What’s Captured

​Metrics Reference

​Custom Tags

​Captured Data

​Span Attributes

​Request Parameters

​Input/Output Data

​Performance Metrics

​Bifrost Context

​Supported Request Types

​When to Use

​Datadog Plugin

​vs. OTel Plugin

​vs. Built-in Observability

​Troubleshooting

​Agent Connectivity Issues

​Agentless Mode Not Working

​Missing Traces

​Missing Metrics

​LLM Observability Not Appearing

​Next Steps

Overview

Deployment Modes

Agent Mode

Agentless Mode

Configuration

Required Fields

Environment Variable Substitution

Setup

Agent Mode (Minimal)

Agent Mode (Full Configuration)

Agentless Mode

Datadog Sites

LLM Observability

ML App Grouping

Session Tracking

W3C Distributed Tracing

What’s Captured

Metrics Reference

Custom Tags

Captured Data

Span Attributes

Request Parameters

Input/Output Data

Performance Metrics

Bifrost Context

Supported Request Types

When to Use

Datadog Plugin

vs. OTel Plugin

vs. Built-in Observability

Troubleshooting

Agent Connectivity Issues

Agentless Mode Not Working

Missing Traces

Missing Metrics

LLM Observability Not Appearing

Next Steps