Overview
Ollama is a local-first, OpenAI-compatible inference engine for running large language models on personal computers or servers. Bifrost delegates to the OpenAI implementation while supporting Ollama’s unique configuration requirements. Key characteristics:- Local-first deployment - Run models locally or on private infrastructure
- OpenAI API compatibility - Identical request/response format
- Full feature support - Chat, text, embeddings, and streaming
- Tool calling - Complete function definition and execution
- Self-hosted - No external API dependency required
Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/chat/completions |
| Responses API | ✅ | ✅ | /v1/chat/completions |
| Text Completions | ✅ | ✅ | /v1/completions |
| Embeddings | ✅ | - | /v1/embeddings |
| List Models | ✅ | - | /v1/models |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
Unsupported Operations (❌): Speech, Transcriptions, Files, and Batch are not supported by the upstream Ollama API. These return
UnsupportedOperationError.Ollama is self-hosted. Ensure you have an Ollama instance running and configured with the correct BaseURL (e.g., http://localhost:11434).1. Chat Completions
Request Parameters
Ollama supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.Filtered Parameters
Removed for Ollama compatibility:prompt_cache_key- Not supportedverbosity- Anthropic-specificstore- Not supportedservice_tier- Not supported
2. Responses API
Converted internally to Chat Completions:3. Text Completions
Ollama supports legacy text completion format:| Parameter | Mapping |
|---|---|
prompt | Direct pass-through |
max_tokens | max_tokens |
temperature, top_p | Direct pass-through |
stop | Stop sequences |
4. Embeddings
Ollama supports text embeddings:| Parameter | Notes |
|---|---|
input | Text or array of texts |
model | Embedding model name |
encoding_format | ”float” or “base64” |
dimensions | Custom output dimensions (optional) |
5. List Models
Lists models currently loaded in Ollama with capabilities and context information.Unsupported Features
| Feature | Reason |
|---|---|
| Speech/TTS | Not offered by Ollama API |
| Transcription/STT | Not offered by Ollama API |
| Batch Operations | Not offered by Ollama API |
| File Management | Not offered by Ollama API |
Ollama follows the OpenAI API specification for request format and error handling. Authentication is optional and depends on deployment (no authentication required for local access, optional Bearer token for protected instances).Critical: BaseURL must be explicitly configured pointing to your Ollama instance (e.g.,
http://localhost:11434 for local, https://ollama.example.com for remote).Configuration
- Gateway
- Go SDK
- Install Ollama from https://ollama.ai
- Pull a model:
- Start Ollama server:
- Verify it’s running:
Performance Considerations
Streaming for Large Models: For better user experience with large models, use streaming:- Llama 3.1 70B: 128K tokens
- Mistral 7B: 32K tokens
- Neural Chat 7B: 8K tokens
Popular Models
| Model | Size | Context | Speed |
|---|---|---|---|
| llama3.1:latest | Varies | 128K | Fast |
| mistral:latest | 7B | 32K | Very Fast |
| neural-chat:latest | 7B | 8K | Very Fast |
| orca-mini:latest | 3B | 3K | Very Fast |
| openchat:latest | 7B | 8K | Very Fast |
Caveats
BaseURL Configuration Required
BaseURL Configuration Required
Severity: High
Behavior: BaseURL must be explicitly configured - no default
Impact: Requests fail without proper configuration
Code: NewOllamaProvider validates BaseURL is set
Cache Control Stripped
Cache Control Stripped
Severity: Low
Behavior: Cache control directives are removed from messages
Impact: Prompt caching features don’t work
Code: Stripped during JSON marshaling
Parameter Filtering
Parameter Filtering
Severity: Low
Behavior: OpenAI-specific parameters filtered out
Impact: prompt_cache_key, verbosity, store removed
Code: filterOpenAISpecificParameters
User Field Size Limit
User Field Size Limit
Severity: Low
Behavior: User field > 64 characters silently dropped
Impact: Longer user identifiers are lost
Code: SanitizeUserField enforces 64-char max

