About
An HTTP filter plugin that parses OpenAI Chat Completions API requests and responses and populates Envoy's dynamic filter metadata with structured information extracted from both the request and response bodies.
This allows downstream filters and route configurations to make decisions based on the contents of the LLM request and response without re-parsing the bodies themselves.
Attribute Naming
Metadata keys follow the OpenInference Semantic Conventions.
List attributes are flattened using indexed dot notation as described in the
LLM Spans attribute flattening spec
(e.g. llm.input_messages.0.message.role).
Configuration Reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
metadata_namespace |
string | no | openai |
The filter metadata namespace for the decoded fields |
Usage Examples
Basic usage (default namespace)
Decode incoming OpenAI Chat Completion requests and expose metadata under the
default openai namespace using OpenInference semantic conventions.
Downstream filters can access openai.llm.model_name,
openai.llm.input_messages.0.message.role, etc.
boe run --extension chat-completions-decoder \
--test-upstream-host api.openai.com
# Send a chat completion request
curl -X POST http://localhost:10000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_APIKEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the weather today?"}
]
}'
# Envoy filter metadata will contain (namespace: "openai"):
# llm.model_name = "gpt-4o"
# llm.system = "openai"
# llm.input_messages.count = 2
# llm.input_messages.0.message.role = "system"
# llm.input_messages.0.message.content = "You are a helpful assistant."
# llm.input_messages.1.message.role = "user"
# llm.input_messages.1.message.content = "What is the weather today?"
# llm.tools.count = 0 Custom metadata namespace
Use a custom namespace to avoid conflicts with other filters that also write to filter metadata.
boe run --extension chat-completions-decoder \
--config '{"metadata_namespace": "llm-request"}' \
--test-upstream-host api.openai.com
# Metadata will now be under the "llm-request" namespace:
# llm.model_name = "gpt-4o"
# llm.system = "openai"
# llm.input_messages.count = 1
# llm.input_messages.0.message.role = "user"
# llm.input_messages.0.message.content = "..."
# llm.tools.count = 0 Request with tools
When the request includes tool definitions, each tool is stored under
llm.tools.N.tool.json_schema as a JSON string, following the OpenInference spec.
boe run --extension chat-completions-decoder \
--test-upstream-host api.openai.com
curl -X POST http://localhost:10000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_APIKEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Book a flight to NYC"}],
"tools": [
{"type": "function", "function": {"name": "book_flight", "description": "Book a flight"}},
{"type": "function", "function": {"name": "cancel_flight", "description": "Cancel a flight"}}
]
}'
# llm.tools.count = 2
# llm.tools.0.tool.json_schema = '{"type":"function","function":{"name":"book_flight","description":"Book a flight"}}'
# llm.tools.1.tool.json_schema = '{"type":"function","function":{"name":"cancel_flight","description":"Cancel a flight"}}' Tool call in conversation with response metadata
When a multi-turn conversation includes an assistant message with a tool call, the filter captures the tool call details from the request. The response metadata includes the assistant reply and token usage.
boe run --extension chat-completions-decoder \
--test-upstream-host api.openai.com
curl -X POST http://localhost:10000/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_APIKEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in NYC?"},
{"role": "assistant", "content": null, "tool_calls": [
{"id": "call_abc", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"NYC\"}"}}
]},
{"role": "tool", "content": "Sunny, 72F"}
]
}'
# Request metadata (namespace: "openai"):
# llm.model_name = "gpt-4o"
# llm.system = "openai"
# llm.input_messages.count = 3
# llm.input_messages.0.message.role = "user"
# llm.input_messages.0.message.content = "What is the weather in NYC?"
# llm.input_messages.1.message.role = "assistant"
# llm.input_messages.1.message.tool_calls.count = 1
# llm.input_messages.1.message.tool_calls.0.tool_call.id = "call_abc"
# llm.input_messages.1.message.tool_calls.0.tool_call.function.name = "get_weather"
# llm.input_messages.1.message.tool_calls.0.tool_call.function.arguments = '{"location":"NYC"}'
# llm.input_messages.2.message.role = "tool"
# llm.input_messages.2.message.content = "Sunny, 72F"
# llm.tools.count = 0
# Response metadata (when the model sends its final reply):
# llm.output_messages.count = 1
# llm.output_messages.0.message.role = "assistant"
# llm.output_messages.0.message.content = "The weather in NYC is sunny and 72F."
# llm.token_count.prompt = 85
# llm.token_count.completion = 14
# llm.token_count.total = 99