LLM Model & Router Chooser
Compute monthly API usage costs, evaluate speed/quality tradeoffs, and generate smart gateway routing logic.
1. Workload Parameters
2. Priority Weights
Ranked Model Recommendations
| Rank & Model | Match Score | Est. Monthly Cost | Tokens/sec | Quality | Context |
|---|---|---|---|---|---|
1. Llama 3.1 70B Meta (Groq) | 76% | $151.70 | 280 | 80/100 | 128k |
2. DeepSeek V3 DeepSeek | 56% | $43.40 | 70 | 84/100 | 64k |
3. Gemini 1.5 Flash Google | 50% | $35.25 | 140 | 75/100 | 1M |
4. GPT-4o Mini OpenAI | 49% | $70.50 | 120 | 76/100 | 128k |
5. Gemini 1.5 Pro Google | 46% | $587.50 | 65 | 86/100 | 2M |
6. Claude 3.5 Haiku Anthropic | 44% | $440.00 | 130 | 78/100 | 200k |
7. GPT-4o OpenAI | 38% | $1,175.00 | 75 | 89/100 | 128k |
8. Claude 3.5 Sonnet Anthropic | 33% | $1,650.00 | 85 | 92/100 | 200k |
Generated Router Function
Complexity levels below this threshold will route to Llama 3.1 70B ($0.59/$M input) while tasks above route to the higher-quality fallback model.
// Router Chooser Gateway Logic (NodeJS / Edge Function)
async function routeLLMRequest(prompt, complexityRating) {
// Threshold complexity (scaled 0-100)
const THRESHOLD = 70;
const payload = {
messages: [{ role: 'user', content: prompt }],
temperature: 0.2
};
if (complexityRating >= THRESHOLD) {
// Route to High-Quality model: DeepSeek V3
console.log("Routing to fallback model: DeepSeek V3");
return callProvider('deepseek', 'deepseek-v3', payload);
} else {
// Route to Optimal model: Llama 3.1 70B
console.log("Routing to primary model: Llama 3.1 70B");
return callProvider('meta (groq)', 'llama-70b', payload);
}
}
async function callProvider(provider, model, payload) {
// API call implementation ...
return { status: 200, model, routed: true };
}