API Reference

AI Gateway (chat & generate)

Call FloopFloop's managed LLM gateway from your deployed project — chat completions, single-prompt generation, streaming SSE, automatic model routing.

Last updated: April 29, 2026

AI Gateway

The AI Gateway lets your deployed FloopFloop projectcall a managed LLM endpoint without you holding any third-party credentials. FloopFloop routes the request to the right provider, handles retries and circuit-breaking, deducts from the project owner's credits, and logs the usage for the dashboard.

This is what powers the AI features inside projects you build on FloopFloop. You can call it directly from your project's server-side code.

Most projects don't need to use this raw HTTP API.Every generated project ships with the@floopfloop/ai SDK pre-installed, which wraps everything documented below:

import { FloopAI } from "@floopfloop/ai";

const ai = new FloopAI({ apiKey: process.env.FLOOPFLOOP_AI_KEY! });

const reply = await ai.chat({ messages, model: "smart" });

Use this reference when you need to call the gateway from outside a FloopFloop project (a custom backend, a debugging script, or a non-Node runtime), or when you want the exact wire format for a custom client.

Authentication: project AI keys

The gateway uses a project-scoped key with the prefix flp_sk_, separate from the user-levelflp_ keys used by the rest of the API. One active key exists per project and is provisioned automatically when the project is created.

Find or rotate your key in the dashboard at Project settings → AI. Rotation generates a newflp_sk_ value, revokes the old one, and triggers a redeploy so the running project picks it up.
FloopFloop bakes the active key into your project's build bundle as the environment variableFLOOPFLOOP_AI_KEY, so server-side code can read it via process.env.FLOOPFLOOP_AI_KEYwithout a secret round-trip.
All requests use the standard Bearer scheme:
```
Authorization: Bearer flp_sk_…
```
Server-side only.Never embed the key in client-side code — anyone with the page source can then drain the project's credits.

Chat completions

POST /api/v1/ai/chat

OpenAI/Anthropic-style chat with a structured message list.

Request body:

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Summarise this in one sentence: ..." }
  ],
  "model":       "auto",   // optional alias; default "auto" picks the best fit
  "system":      "...",    // optional, alternative to a system role message
  "temperature": 0.7,      // optional, 0-2
  "max_tokens":  1024,     // optional, clamped to plan limit
  "stream":      false     // optional, default false (see streaming below)
}

Response (200, non-streaming):

{
  "content": "...",
  "model":   "auto",
  "usage": {
    "input_tokens":      123,
    "output_tokens":     45,
    "total_tokens":      168,
    "credits_used":      0.21,
    "credits_remaining": 4837.79
  },
  "finishReason": "end_turn"
}

When model is "auto", the gateway classifies request complexity and picks an appropriately-sized model. The other supported aliases are:

Alias	Use case
`"auto"`	Default — platform picks based on request complexity
`"fast"`	Simple tasks, low latency (translations, summaries, classification)
`"smart"`	Complex tasks (code generation, analysis, reasoning)
`"reason"`	Multi-step reasoning, planning, deep analysis (extended thinking)

Pinning to an alias is preferred over hard-coding a provider model id — the platform reroutes through aliases as providers come and go, but a hard-coded id will start failing the day that model is sunset upstream.

Single-prompt generation

POST /api/v1/ai/generate

Same model routing as /chat, but takes a single plain-text prompt instead of a messages array. Convenient for completion-style use cases.

{
  "prompt":      "Write a haiku about the moon",   // required, ≤ 500_000 chars
  "system":      "...",                            // optional
  "model":       "auto",                           // optional
  "temperature": 0.7,                              // optional
  "max_tokens":  1024,                             // optional
  "stream":      false                             // optional
}

The response shape matches /chat.

Streaming (SSE)

Pass "stream": true on either endpoint to receive an text/event-stream response. Each frame is JSON in a data: line:

data: {"text": "Once "}

data: {"text": "upon "}

data: {"text": "a time"}

data: {"usage": { "input_tokens": 14, "output_tokens": 47, "total_tokens": 61, "credits_used": 0.07, "credits_remaining": 4837.93 }, "finishReason": "end_turn"}

data: [DONE]

The final usage frame is sent before[DONE] so callers can record cost without a separate request. If the upstream errors mid-stream, the stream emits data: {"error": "..."} followed by [DONE]— tokens already produced are still billed.

Embeddings (not yet available)

POST /api/v1/ai/embed

Returns 501 NOT_IMPLEMENTED today. The endpoint is reserved so SDKs can stub the method ahead of the gateway shipping; do not depend on it yet.

Limits and budgets

Each project AI key has two layers of throttling on top of the credit balance:

Requests per minute— configurable inProject settings → AI, defaults to 10 RPM. Excess returns 429 RATE_LIMITED with aRetry-After header.
Daily token budget— resets at UTC midnight, defaults to 10 000 tokens/day. Excess returns429 BUDGET_EXCEEDED.
Per-request input size is capped at the plan's context limit. Oversize inputs return 400 INPUT_TOO_LARGEwith the cap in the message.

On top of those, every call deducts credits priced per (input + output) token at the model's configured rate. Once the project owner's credit balance reaches zero, requests return 402 INSUFFICIENT_CREDITS.

Error codes specific to the gateway

HTTP	Code	Meaning
400	`INVALID_BODY`	Body is not valid JSON
400	`VALIDATION_ERROR`	Field is missing, wrong type, or out of range
400	`INPUT_TOO_LARGE`	Estimated input tokens exceed plan limit
400	`INVALID_MODEL`	Unknown model alias
402	`INSUFFICIENT_CREDITS`	Project owner is out of credits
429	`RATE_LIMITED`	Per-key RPM exceeded
429	`BUDGET_EXCEEDED`	Daily token budget exhausted
501	`NOT_IMPLEMENTED`	Endpoint reserved (currently only embed)
502	`PROVIDER_ERROR`	Upstream LLM provider failed; retry later
503	`SERVICE_UNAVAILABLE`	All providers tripped; `Retry-After: 60`

Every gateway response — success or failure — carries an X-Request-Id header. Quote it when reporting issues so support can find the trace in the per-project AI usage log.

PreviousSubdomain Lookups NextError Handling