API Documentation
Kallavy is an AI broker: your app talks to a single
OpenAI-compatible endpoint and we route to the world's best models — billed in
BRL via PIX, with a Brazilian invoice and local support.
Already using the OpenAI SDK? Just swap base_url and
api_key. No code changes.
Introduction
This reference covers the developer-facing endpoints. Everything is HTTPS with JSON bodies and follows the same contract as OpenAI's Chat Completions API. Account management — keys, balance, usage, governance and invoices — lives in the dashboard, not the API.
Privacy (LGPD): we only store usage metadata (model, tokens, cost). The content of your prompts and responses is never stored.
Base URL
All calls use the dedicated broker domain:
https://api.kallavy.com/v1
Authentication
Every request needs the Authorization header with your API key
as a Bearer token. Create and manage keys in the
dashboard → API Keys.
Authorization: Bearer sk-...your-key
Quickstart
A full chat call, in three languages:
# pip install openai
from openai import OpenAI
client = OpenAI(
api_key="sk-...your-key",
base_url="https://api.kallavy.com/v1",
)
resp = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Summarize this contract in 3 lines."}],
)
print(resp.choices[0].message.content)
/v1/models
Lists the models available to your account. Use this as the source of truth — the catalog changes as new providers come online and as your account governance evolves. Same shape as OpenAI.
{
"object": "list",
"data": [
{ "id": "deepseek-chat", "object": "model", "owned_by": "kallavy" },
{ "id": "claude-sonnet", "object": "model", "owned_by": "kallavy" },
// ...
]
}
/v1/chat/completions
Generates a chat response. Accepts the same fields as OpenAI; Kallavy authenticates, meters usage and forwards to the real provider behind the chosen model.
Body parameters
| Field | Type | Description |
|---|---|---|
| model * | string | Model ID (e.g. deepseek-chat). See /v1/models. |
| messages * | array | List of { "role", "content" } messages (system, user, assistant). |
| stream | boolean | If true, streams tokens via SSE. Default false. |
| temperature, max_tokens, top_p… | various | Other standard OpenAI params are forwarded to the provider. |
* required
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "deepseek-chat",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 42, "completion_tokens": 88, "total_tokens": 130 }
}
Streaming
With stream: true, the response arrives as Server-Sent
Events chunks (data: {...}), ending with data: [DONE] — identical to OpenAI.
The SDKs handle this for you.
for chunk in client.chat.completions.create(
model="claude-sonnet",
messages=[{"role": "user", "content": "Write a haiku."}],
stream=True,
):
print(chunk.choices[0].delta.content or "", end="")
Errors
Errors follow standard HTTP status codes with a descriptive JSON body.
| Code | Meaning |
|---|---|
| 401 | Missing, invalid or revoked key. |
| 403 | Blocked by account governance (model/provider not allowed, or spend cap reached). |
| 402 | Insufficient balance — top up via PIX in the dashboard. |
| 429 | Rate limit exceeded. Back off and retry. |
| 5xx | Upstream provider failure. Retry. |
Limits & billing
- Usage is metered by input and output tokens, per model, and debited from your BRL balance.
- Rate limits apply per key (tunable on your plan). On overflow the API returns
429. - Your company can set governance: allow/block models and providers and enforce spend caps — all in the dashboard.
- Top-ups via PIX with NF-e invoicing. Balance and history live in the dashboard.
Support
Integration questions? Reach us at support@kallavy.com.
Ready for your first call?
Create an account, grab an API key, and start in minutes.