LLM Gateway

Audience: App developers building AI features
Base path: /v1/workspaces/{workspace_id}/llm
Auth: Keycloak bearer token + workspace permissions
Inference path: App → LiteLLM directly (Control Plane is not in the data path)

Control Plane mints workspace virtual keys, exposes the model catalog, and records Compute Unit (CU) usage via webhooks. Your app calls LiteLLM with an OpenAI-compatible client.

Architecture

1. Admin: POST /v1/workspaces/{id}/llm/virtual-key  (administer)
2. Store key in Secrets as LITELLM_API_KEY
3. App: POST {LITELLM_URL}/v1/chat/completions
         Authorization: Bearer {LITELLM_API_KEY}
4. LiteLLM → webhook → Control Plane → billing events

Control Plane endpoints

Method	Path	Permission	Purpose
`GET`	`/v1/workspaces/{id}/llm/models`	`read`	Groundfloor catalog + BYO models
`GET`	`/v1/workspaces/{id}/llm/usage`	`read`	Recent CU rollup
`POST`	`/v1/workspaces/{id}/llm/virtual-key`	`administer`	Mint or rotate virtual key (shown once)

Step 1 — Mint a virtual key

POST /v1/workspaces/{workspace_id}/llm/virtual-key
Authorization: Bearer {keycloak_access_token}

{ "virtual_key": "sk-…" }

The key is returned once. Store it immediately:

PUT /v1/workspaces/{workspace_id}/secrets/LITELLM_API_KEY
Authorization: Bearer {keycloak_access_token}
Content-Type: application/json

{ "value": "sk-…", "description": "LiteLLM workspace virtual key" }

Requires administer on the workspace (workspace owner / admin role).

Step 2 — List models

GET /v1/workspaces/{workspace_id}/llm/models
Authorization: Bearer {keycloak_access_token}

Returns Groundfloor default catalog entries plus BYO models discovered from LiteLLM when provider keys exist in Secrets.

Default Groundfloor model ids (use these in LiteLLM calls):

`gf_id`	Role	Upstream (via LiteLLM)
`gf-chat-default`	chat	`openai/gpt-4o-mini`
`gf-chat-pro`	chat	`anthropic/claude-sonnet-4-5`
`gf-code-default`	code	`anthropic/claude-sonnet-4-5`
`gf-embed-default`	embedding	`openai/text-embedding-3-small`
`gf-vision-default`	vision	`openai/gpt-4o`

Pass gf_id as the model field when calling LiteLLM — routing resolves to the upstream provider.

Step 3 — Call LiteLLM (OpenAI-compatible)

Local dev (deploy/PHASE2-DEPS.md):

LITELLM_URL=http://localhost:4000

curl -s "${LITELLM_URL}/v1/chat/completions" \
  -H "Authorization: Bearer ${LITELLM_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gf-chat-default",
    "messages": [{ "role": "user", "content": "Hello" }]
  }'

TypeScript (any OpenAI SDK):

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: process.env.LITELLM_URL ?? "http://localhost:4000/v1",
  apiKey: process.env.LITELLM_API_KEY, // from Secrets, server-side only
});

const completion = await client.chat.completions.create({
  model: "gf-chat-default",
  messages: [{ role: "user", content: "Hello" }],
});

Production LITELLM_URL is environment-specific — inject via Shell/host env, not the federated bundle.

BYO provider keys

Customers can add their own provider keys to Secrets (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY). LiteLLM routes to those providers and additional models appear in GET …/llm/models with "origin": "byo".

Always route through LiteLLM — do not call OpenAI/Anthropic directly with hardcoded keys. Direct calls bypass workspace spend caps and CU attribution.

Usage and billing

GET /v1/workspaces/{workspace_id}/llm/usage?since_seconds=86400&limit=100
Authorization: Bearer {keycloak_access_token}

Returns token counts and CU estimates per completion. Full account billing: GET /v1/accounts/{id}/billing/summary.

LiteLLM enforces max_budget per virtual key (spend cap at workspace level).

Environment variables

Variable	Where	Purpose
`LITELLM_URL`	Control Plane API `.env`	Admin client for mint/models
`LITELLM_MASTER_KEY`	Control Plane API `.env`	LiteLLM admin API
`LITELLM_API_KEY`	Workspace secret	App inference (per workspace)

Bring up LiteLLM locally:

docker compose -f deploy/docker-compose.phase2-deps.yml --profile litellm up -d
curl -s http://localhost:4000/health/liveness

Provider keys in Secrets are required before models respond (empty model list until keys exist).

Errors

Status	Meaning
`403`	Need `administer` to mint virtual key
`502`	LiteLLM unreachable from Control Plane
LiteLLM `401`	Invalid or expired virtual key
LiteLLM empty/error	Missing provider keys in Secrets

06-secrets.md — store LITELLM_API_KEY
03-authentication.md — Keycloak for Control Plane calls
11-local-dev-recipes.md — end-to-end curl flow

LLM Gateway

On this page