LLM Gateway
Workspace-scoped LLM virtual keys and LiteLLM proxy usage.
Audience: App developers building AI features
Base path: /v1/workspaces/{workspace_id}/llm
Auth: Keycloak bearer token + workspace permissions
Inference path: App → LiteLLM directly (Control Plane is not in the data path)
Control Plane mints workspace virtual keys, exposes the model catalog, and records Compute Unit (CU) usage via webhooks. Your app calls LiteLLM with an OpenAI-compatible client.
Architecture
1. Admin: POST /v1/workspaces/{id}/llm/virtual-key (administer)
2. Store key in Secrets as LITELLM_API_KEY
3. App: POST {LITELLM_URL}/v1/chat/completions
Authorization: Bearer {LITELLM_API_KEY}
4. LiteLLM → webhook → Control Plane → billing eventsControl Plane endpoints
| Method | Path | Permission | Purpose |
|---|---|---|---|
GET | /v1/workspaces/{id}/llm/models | read | Groundfloor catalog + BYO models |
GET | /v1/workspaces/{id}/llm/usage | read | Recent CU rollup |
POST | /v1/workspaces/{id}/llm/virtual-key | administer | Mint or rotate virtual key (shown once) |
Step 1 — Mint a virtual key
POST /v1/workspaces/{workspace_id}/llm/virtual-key
Authorization: Bearer {keycloak_access_token}{ "virtual_key": "sk-…" }The key is returned once. Store it immediately:
PUT /v1/workspaces/{workspace_id}/secrets/LITELLM_API_KEY
Authorization: Bearer {keycloak_access_token}
Content-Type: application/json
{ "value": "sk-…", "description": "LiteLLM workspace virtual key" }Requires administer on the workspace (workspace owner / admin role).
Step 2 — List models
GET /v1/workspaces/{workspace_id}/llm/models
Authorization: Bearer {keycloak_access_token}Returns Groundfloor default catalog entries plus BYO models discovered from LiteLLM when provider keys exist in Secrets.
Default Groundfloor model ids (use these in LiteLLM calls):
gf_id | Role | Upstream (via LiteLLM) |
|---|---|---|
gf-chat-default | chat | openai/gpt-4o-mini |
gf-chat-pro | chat | anthropic/claude-sonnet-4-5 |
gf-code-default | code | anthropic/claude-sonnet-4-5 |
gf-embed-default | embedding | openai/text-embedding-3-small |
gf-vision-default | vision | openai/gpt-4o |
Pass gf_id as the model field when calling LiteLLM — routing resolves to the upstream provider.
Step 3 — Call LiteLLM (OpenAI-compatible)
Local dev (deploy/PHASE2-DEPS.md):
LITELLM_URL=http://localhost:4000curl -s "${LITELLM_URL}/v1/chat/completions" \
-H "Authorization: Bearer ${LITELLM_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "gf-chat-default",
"messages": [{ "role": "user", "content": "Hello" }]
}'TypeScript (any OpenAI SDK):
import OpenAI from "openai";
const client = new OpenAI({
baseURL: process.env.LITELLM_URL ?? "http://localhost:4000/v1",
apiKey: process.env.LITELLM_API_KEY, // from Secrets, server-side only
});
const completion = await client.chat.completions.create({
model: "gf-chat-default",
messages: [{ role: "user", content: "Hello" }],
});Production LITELLM_URL is environment-specific — inject via Shell/host env, not the federated bundle.
BYO provider keys
Customers can add their own provider keys to Secrets (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY). LiteLLM routes to those providers and additional models appear in GET …/llm/models with "origin": "byo".
Always route through LiteLLM — do not call OpenAI/Anthropic directly with hardcoded keys. Direct calls bypass workspace spend caps and CU attribution.
Usage and billing
GET /v1/workspaces/{workspace_id}/llm/usage?since_seconds=86400&limit=100
Authorization: Bearer {keycloak_access_token}Returns token counts and CU estimates per completion. Full account billing: GET /v1/accounts/{id}/billing/summary.
LiteLLM enforces max_budget per virtual key (spend cap at workspace level).
Environment variables
| Variable | Where | Purpose |
|---|---|---|
LITELLM_URL | Control Plane API .env | Admin client for mint/models |
LITELLM_MASTER_KEY | Control Plane API .env | LiteLLM admin API |
LITELLM_API_KEY | Workspace secret | App inference (per workspace) |
Bring up LiteLLM locally:
docker compose -f deploy/docker-compose.phase2-deps.yml --profile litellm up -d
curl -s http://localhost:4000/health/livenessProvider keys in Secrets are required before models respond (empty model list until keys exist).
Errors
| Status | Meaning |
|---|---|
403 | Need administer to mint virtual key |
502 | LiteLLM unreachable from Control Plane |
LiteLLM 401 | Invalid or expired virtual key |
| LiteLLM empty/error | Missing provider keys in Secrets |
Related
- 06-secrets.md — store
LITELLM_API_KEY - 03-authentication.md — Keycloak for Control Plane calls
- 11-local-dev-recipes.md — end-to-end curl flow