OpenAI-API compatible

API Documentation

Kallavy is an AI broker: your app talks to a single OpenAI-compatible endpoint and we route to the world's best models — billed in BRL via PIX, with a Brazilian invoice and local support. Already using the OpenAI SDK? Just swap base_url and api_key. No code changes.

Introduction

This reference covers the developer-facing endpoints. Everything is HTTPS with JSON bodies and follows the same contract as OpenAI's Chat Completions API. Account management — keys, balance, usage, governance and invoices — lives in the dashboard, not the API.

Privacy (LGPD): we only store usage metadata (model, tokens, cost). The content of your prompts and responses is never stored.

Base URL

All calls use the dedicated broker domain:

endpoint

https://api.kallavy.com/v1

Authentication

Every request needs the Authorization header with your API key as a Bearer token. Create and manage keys in the dashboard → API Keys.

header

Authorization: Bearer sk-...your-key

Keep your key secret. It grants access to your account balance. Never expose it in front-end code or public repos. Compromised? Revoke it in the dashboard and rotate.

Quickstart

A full chat call, in three languages:

# pip install openai
from openai import OpenAI

client = OpenAI(
    api_key="sk-...your-key",
    base_url="https://api.kallavy.com/v1",
)

resp = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Summarize this contract in 3 lines."}],
)
print(resp.choices[0].message.content)

// npm install openai
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "sk-...your-key",
  baseURL: "https://api.kallavy.com/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek-chat",
  messages: [{ role: "user", content: "Summarize this contract in 3 lines." }],
});
console.log(resp.choices[0].message.content);

curl https://api.kallavy.com/v1/chat/completions \
  -H "Authorization: Bearer sk-...your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Summarize this contract in 3 lines."}]
  }'

GET /v1/models

Lists the models available to your account. Use this as the source of truth — the catalog changes as new providers come online and as your account governance evolves. Same shape as OpenAI.

response · 200

{
  "object": "list",
  "data": [
    { "id": "deepseek-chat", "object": "model", "owned_by": "kallavy" },
    { "id": "claude-sonnet", "object": "model", "owned_by": "kallavy" },
    // ...
  ]
}

POST /v1/chat/completions

Generates a chat response. Accepts the same fields as OpenAI; Kallavy authenticates, meters usage and forwards to the real provider behind the chosen model.

Body parameters

Field	Type	Description
model *	string	Model ID (e.g. `deepseek-chat`). See /v1/models.
messages *	array	List of `{ "role", "content" }` messages (`system`, `user`, `assistant`).
stream	boolean	If `true`, streams tokens via SSE. Default `false`.
temperature, max_tokens, top_p…	various	Other standard OpenAI params are forwarded to the provider.

* required

Response

response · 200

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 42, "completion_tokens": 88, "total_tokens": 130 }
}

Streaming

With stream: true, the response arrives as Server-Sent Events chunks (data: {...}), ending with data: [DONE] — identical to OpenAI. The SDKs handle this for you.

for chunk in client.chat.completions.create(
    model="claude-sonnet",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
):
    print(chunk.choices[0].delta.content or "", end="")

Errors

Errors follow standard HTTP status codes with a descriptive JSON body.

Code	Meaning
401	Missing, invalid or revoked key.
403	Blocked by account governance (model/provider not allowed, or spend cap reached).
402	Insufficient balance — top up via PIX in the dashboard.
429	Rate limit exceeded. Back off and retry.
5xx	Upstream provider failure. Retry.

Limits & billing

Usage is metered by input and output tokens, per model, and debited from your BRL balance.
Rate limits apply per key (tunable on your plan). On overflow the API returns 429.
Your company can set governance: allow/block models and providers and enforce spend caps — all in the dashboard.
Top-ups via PIX with NF-e invoicing. Balance and history live in the dashboard.

Support

Integration questions? Reach us at support@kallavy.com.

Ready for your first call?

Create an account, grab an API key, and start in minutes.

Get started →