About
An HTTP filter plugin that inspects incoming requests against a set of configured path-matcher rules to identify the LLM provider API in use (OpenAI Chat Completions, Anthropic Messages, or a custom OpenAI-compatible API). Once a rule matches, the filter:
- Parses the request body to extract the model name and streaming flag, then writes them to Envoy's dynamic filter metadata.
- Parses the response body (JSON for non-streaming, SSE for streaming) to extract token-usage information and writes it to filter metadata.
- Records Envoy metrics (counters and histograms) for request counts, token usage, time-to-first-token (TTFT), and time-per-output-token (TPOT).
Requests whose path does not match any rule are passed through without modification.
If no rule is explicitly configured for OpenAI or Anthropic, the filter automatically
adds default suffix-matcher rules for /v1/chat/completions (OpenAI) and
/v1/messages (Anthropic), so it works out of the box with no configuration.
Metadata Keys
All keys are written under the configured metadata_namespace
(default: io.builtonenvoy.llm-proxy).
| Key | Type | Description |
|---|---|---|
kind |
string | API kind: "openai", "anthropic", or "custom" |
model |
string | Model name extracted from the request body |
is_stream |
bool | Whether the request asks for a streaming (SSE) response |
input_tokens |
uint32 | Input / prompt token count from the response |
output_tokens |
uint32 | Output / completion token count from the response |
total_tokens |
uint32 | Total token count from the response |
request_ttft |
int64 | Time to first token in milliseconds |
request_tpot |
int64 | Average time per output token in milliseconds |
Metrics
All metrics are tagged with kind and model labels.
| Metric | Type | Description |
|---|---|---|
llm_proxy_request_total |
counter | Successfully parsed LLM requests |
llm_proxy_request_error |
counter | Requests that failed to parse |
llm_proxy_input_tokens |
counter | Accumulated input token counts |
llm_proxy_output_tokens |
counter | Accumulated output token counts |
llm_proxy_total_tokens |
counter | Accumulated total token counts |
llm_proxy_request_ttft |
histogram | Time to first token in milliseconds |
llm_proxy_request_tpot |
histogram | Average time per output token in milliseconds |
Configuration Reference
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
llm_configs |
array | no | auto | Ordered list of path-matcher rules; first match wins |
llm_configs[].matcher |
object | yes | — | Path matcher: set exactly one of prefix, suffix, or regex |
llm_configs[].kind |
string | yes | — | "openai", "anthropic", or "custom" |
metadata_namespace |
string | no | io.builtonenvoy.llm-proxy |
Filter metadata namespace |
llm_model_header |
string | no | "" |
If set, the extracted model name is written to this request header |
clear_route_cache |
bool | no | false |
Clear the route cache after request parsing so Envoy can re-select the route based on updated metadata |
Usage Examples
Zero-config default rules
With no configuration the filter automatically matches /v1/chat/completions
(OpenAI) and /v1/messages (Anthropic) and writes metadata under the default
namespace.
boe run --extension llm-proxy
# After an OpenAI request the following metadata will be set
# (namespace: "io.builtonenvoy.llm-proxy"):
# kind = "openai"
# model = "gpt-4o"
# is_stream = false
# input_tokens = 42
# output_tokens = 18
# total_tokens = 60 Explicit rules for OpenAI and Anthropic
Configure explicit prefix rules for both providers. The first matching rule wins.
boe run --extension llm-proxy \
--config '{
"llm_configs": [
{"matcher": {"prefix": "/v1/chat/completions"}, "kind": "openai"},
{"matcher": {"prefix": "/v1/messages"}, "kind": "anthropic"}
]
}' Custom metadata namespace
Write metadata under a custom namespace to avoid conflicts with other filters.
boe run --extension llm-proxy \
--config '{
"metadata_namespace": "my-llm-ns",
"llm_configs": [
{"matcher": {"prefix": "/v1/chat/completions"}, "kind": "openai"}
]
}' Route to different clusters based on model name
Use llm_model_header to inject the extracted model name as a request header,
then configure an Envoy route to select a cluster based on that header.
Enable clear_route_cache so Envoy re-evaluates the route after the header is set.
boe run --extension llm-proxy \
--config '{
"llm_model_header": "x-llm-model",
"clear_route_cache": true
}'