1. Cap retry attempts
Retries consume request budget and can make overload worse. Set a small maximum attempt count and use jittered backoff.
Reliability pattern
Use bounded exponential backoff for transient failures and a deliberate fallback model list when your application can tolerate route changes.
const retryable = new Set([408, 409, 429, 500, 502, 503, 504]);
async function withBackoff(run, attempts = 4) {
let lastError;
for (let i = 0; i < attempts; i += 1) {
try {
return await run();
} catch (error) {
lastError = error;
const status = error.status ?? error.response?.status;
if (!retryable.has(status) || i === attempts - 1) throw error;
const delay = Math.min(8000, 500 * 2 ** i) + Math.random() * 250;
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw lastError;
}
Keep the fallback list short and explicit. A lower-cost or faster fallback can be useful, but it may change output quality, supported features or latency.
const models = [
process.env.TKEN_PRIMARY_MODEL,
process.env.TKEN_FALLBACK_MODEL
].filter(Boolean);
for (const model of models) {
try {
const response = await withBackoff(() =>
client.chat.completions.create({
model,
messages: [{ role: "user", content: "Summarize this request." }]
})
);
return response.choices[0].message.content;
} catch (error) {
console.warn("route failed", { model, status: error.status });
}
}
Retries consume request budget and can make overload worse. Set a small maximum attempt count and use jittered backoff.
Do not replay checkout, account or database writes just because a model request failed. Retry only the model call.
Track fallback frequency, response time, error class and user satisfaction before increasing fallback traffic.
Add bounded retries to https://www.tken.shop/v1 calls, then test fallback behavior with real prompts before production routing.