API Reference
AI Gateway (chat & generate)
Call FloopFloop's managed LLM gateway from your deployed project — chat completions, single-prompt generation, streaming SSE, automatic model routing.
Last updated:
AI Gateway
The AI Gateway lets your deployed FloopFloop projectcall a managed LLM endpoint without you holding any third-party credentials. FloopFloop routes the request to the right provider, handles retries and circuit-breaking, deducts from the project owner's credits, and logs the usage for the dashboard.
This is what powers the AI features inside projects you build on FloopFloop. You can call it directly from your project's server-side code.
Most projects don't need to use this raw HTTP API.Every generated project ships with the@floopfloop/ai SDK pre-installed, which wraps everything documented below:
import { FloopAI } from "@floopfloop/ai";
const ai = new FloopAI({ apiKey: process.env.FLOOPFLOOP_AI_KEY! });
const reply = await ai.chat({ messages, model: "smart" });Use this reference when you need to call the gateway from outside a FloopFloop project (a custom backend, a debugging script, or a non-Node runtime), or when you want the exact wire format for a custom client.
Authentication: project AI keys
The gateway uses a project-scoped key with the prefix flp_sk_, separate from the user-levelflp_ keys used by the rest of the API. One active key exists per project and is provisioned automatically when the project is created.
- Find or rotate your key in the dashboard at Project settings → AI. Rotation generates a new
flp_sk_value, revokes the old one, and triggers a redeploy so the running project picks it up. - FloopFloop bakes the active key into your project's build bundle as the environment variable
FLOOPFLOOP_AI_KEY, so server-side code can read it viaprocess.env.FLOOPFLOOP_AI_KEYwithout a secret round-trip. - All requests use the standard Bearer scheme:
Authorization: Bearer flp_sk_… - Server-side only.Never embed the key in client-side code — anyone with the page source can then drain the project's credits.
Chat completions
POST /api/v1/ai/chatOpenAI/Anthropic-style chat with a structured message list.
Request body:
{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Summarise this in one sentence: ..." }
],
"model": "auto", // optional alias; default "auto" picks the best fit
"system": "...", // optional, alternative to a system role message
"temperature": 0.7, // optional, 0-2
"max_tokens": 1024, // optional, clamped to plan limit
"stream": false // optional, default false (see streaming below)
}Response (200, non-streaming):
{
"content": "...",
"model": "auto",
"usage": {
"input_tokens": 123,
"output_tokens": 45,
"total_tokens": 168,
"credits_used": 0.21,
"credits_remaining": 4837.79
},
"finishReason": "end_turn"
}When model is "auto", the gateway classifies request complexity and picks an appropriately-sized model. The other supported aliases are:
| Alias | Use case |
|---|---|
"auto" | Default — platform picks based on request complexity |
"fast" | Simple tasks, low latency (translations, summaries, classification) |
"smart" | Complex tasks (code generation, analysis, reasoning) |
"reason" | Multi-step reasoning, planning, deep analysis (extended thinking) |
Pinning to an alias is preferred over hard-coding a provider model id — the platform reroutes through aliases as providers come and go, but a hard-coded id will start failing the day that model is sunset upstream.
Single-prompt generation
POST /api/v1/ai/generateSame model routing as /chat, but takes a single plain-text prompt instead of a messages array. Convenient for completion-style use cases.
{
"prompt": "Write a haiku about the moon", // required, ≤ 500_000 chars
"system": "...", // optional
"model": "auto", // optional
"temperature": 0.7, // optional
"max_tokens": 1024, // optional
"stream": false // optional
}The response shape matches /chat.
Streaming (SSE)
Pass "stream": true on either endpoint to receive an text/event-stream response. Each frame is JSON in a data: line:
data: {"text": "Once "}
data: {"text": "upon "}
data: {"text": "a time"}
data: {"usage": { "input_tokens": 14, "output_tokens": 47, "total_tokens": 61, "credits_used": 0.07, "credits_remaining": 4837.93 }, "finishReason": "end_turn"}
data: [DONE]The final usage frame is sent before[DONE] so callers can record cost without a separate request. If the upstream errors mid-stream, the stream emits data: {"error": "..."} followed by [DONE]— tokens already produced are still billed.
Embeddings (not yet available)
POST /api/v1/ai/embedReturns 501 NOT_IMPLEMENTED today. The endpoint is reserved so SDKs can stub the method ahead of the gateway shipping; do not depend on it yet.
Limits and budgets
Each project AI key has two layers of throttling on top of the credit balance:
- Requests per minute— configurable inProject settings → AI, defaults to 10 RPM. Excess returns
429 RATE_LIMITEDwith aRetry-Afterheader. - Daily token budget— resets at UTC midnight, defaults to 10 000 tokens/day. Excess returns
429 BUDGET_EXCEEDED. - Per-request input size is capped at the plan's context limit. Oversize inputs return
400 INPUT_TOO_LARGEwith the cap in the message.
On top of those, every call deducts credits priced per (input + output) token at the model's configured rate. Once the project owner's credit balance reaches zero, requests return 402 INSUFFICIENT_CREDITS.
Error codes specific to the gateway
| HTTP | Code | Meaning |
|---|---|---|
| 400 | INVALID_BODY | Body is not valid JSON |
| 400 | VALIDATION_ERROR | Field is missing, wrong type, or out of range |
| 400 | INPUT_TOO_LARGE | Estimated input tokens exceed plan limit |
| 400 | INVALID_MODEL | Unknown model alias |
| 402 | INSUFFICIENT_CREDITS | Project owner is out of credits |
| 429 | RATE_LIMITED | Per-key RPM exceeded |
| 429 | BUDGET_EXCEEDED | Daily token budget exhausted |
| 501 | NOT_IMPLEMENTED | Endpoint reserved (currently only embed) |
| 502 | PROVIDER_ERROR | Upstream LLM provider failed; retry later |
| 503 | SERVICE_UNAVAILABLE | All providers tripped; Retry-After: 60 |
Every gateway response — success or failure — carries an X-Request-Id header. Quote it when reporting issues so support can find the trace in the per-project AI usage log.