LLM API Pricing Comparison: How to Estimate Real Costs Across OpenAI, DeepSeek, Qwen, Kimi, GLM, and Doubao

·
LLM API PricingAI Cost OptimizationToken PricingAI API Gateway

LLM API pricing looks simple until your product reaches real usage. Most providers publish input and output token prices, but your actual bill depends on much more: prompt size, output length, retries, model selection, long context, failed requests, caching, user behavior, and routing strategy.

This guide explains how to compare LLM API pricing in a way that actually helps production teams. It applies whether you are evaluating OpenAI, Anthropic, Google, DeepSeek, Qwen, Kimi, GLM, Doubao, or a mix of providers behind an AI API gateway.

The basic LLM pricing formula

Most text-generation APIs are priced by tokens. A simplified formula looks like this:

cost = input_tokens * input_price + output_tokens * output_price

In reality, you may also need to account for:

  • cached input tokens
  • reasoning tokens
  • image or audio tokens
  • tool call overhead
  • failed requests
  • retries
  • minimum billing increments
  • provider-specific billing rules

Still, input and output tokens are the foundation.

Input tokens vs output tokens

Input tokens are the text you send to the model. This includes:

  • system prompt
  • developer instructions
  • user message
  • conversation history
  • retrieved documents
  • tool definitions
  • examples

Output tokens are the model's response.

Output tokens are often more expensive than input tokens. That means verbose answers, long reasoning, and unnecessary formatting can increase cost quickly.

Why headline token price is misleading

A model with cheap token pricing is not always cheaper in production. Total cost per successful task is the better metric.

For example:

  • Model A is cheap but fails often, requiring retries.
  • Model B is more expensive but solves the task in one request.
  • Model C has low input cost but produces long outputs.
  • Model D is excellent but too slow for your user experience.

The winner depends on task success rate, not just listed price.

Calculate cost per task

Instead of asking "which model is cheapest per token?", ask:

How much does it cost to complete this workflow successfully?

For each workload, measure:

  • average input tokens
  • average output tokens
  • success rate
  • retry rate
  • latency
  • human correction rate
  • fallback rate

A simple production estimate:

cost_per_success =
  average_request_cost * average_attempts_per_success

If a model costs $0.002 per request but needs two attempts, its effective cost is closer to $0.004 per successful task.

Common workloads and cost drivers

Chatbots

Chatbots often become expensive because conversation history grows over time. Each turn may include previous messages.

Cost controls:

  • summarize older messages
  • limit history window
  • use smaller models for simple turns
  • route complex turns to stronger models
  • cap output length

Document Q&A

Document workflows can have large input costs because retrieved context may be long.

Cost controls:

  • improve retrieval quality
  • send fewer chunks
  • deduplicate context
  • compress documents
  • use long-context models only when needed
  • test whether smaller context performs just as well

Coding assistants

Coding tasks can produce long outputs and may require retries.

Cost controls:

  • ask for focused diffs
  • limit scope
  • use repository-aware retrieval
  • validate with tests
  • route simple explanations to cheaper models

Structured extraction

Extraction tasks often need strict JSON. Cheap models may fail schema validation more often.

Cost controls:

  • use schema validation
  • retry only invalid outputs
  • keep prompts short
  • choose models with reliable formatting

Comparing Chinese LLM API pricing

When evaluating Chinese LLM APIs such as DeepSeek, Qwen, Kimi, GLM, and Doubao, compare them on workload fit.

General guidance:

  • evaluate DeepSeek for reasoning and coding workloads
  • evaluate Qwen for broad model coverage and flexible model tiers
  • evaluate Kimi for long-context document workflows
  • evaluate GLM for enterprise Chinese-language applications
  • evaluate Doubao for ByteDance cloud scenarios and general chat

Do not assume the lowest listed token price wins. Run the same prompt set across providers and measure total cost per accepted result.

How routing reduces LLM API cost

The easiest way to waste money is sending every request to your strongest model.

A better strategy is model routing:

| Request type | Cost strategy |
|---|---|
| Simple classification | Small, cheap model |
| Basic rewriting | Low-cost general model |
| Complex reasoning | Strong reasoning model |
| Long document Q&A | Long-context model |
| High-value enterprise request | Premium model |
| Failed request | Fallback provider |

An AI API gateway can apply these rules centrally, so your application does not need provider-specific routing code everywhere.

Cost controls every AI product needs

Production AI systems should include:

  • user-level quotas
  • team-level quotas
  • monthly budget limits
  • model access controls
  • max token limits
  • request logging
  • cost dashboards
  • alerts for unusual usage
  • fallback rules
  • per-feature usage tracking

Without these controls, a small product change or abusive user can create a large bill.

Example cost analysis workflow

1. Collect 100 real prompts from your product. 2. Group them by workload type. 3. Run each group across candidate models. 4. Record input tokens, output tokens, latency, and success. 5. Validate structured outputs automatically. 6. Have humans review subjective tasks. 7. Calculate cost per successful task. 8. Create routing rules. 9. Monitor production logs. 10. Re-evaluate monthly.

Model prices and model quality change quickly. Your routing should evolve with them.

FAQ

What is the biggest hidden LLM API cost?

Long prompts and conversation history are common hidden costs. Retries and failed outputs are also easy to underestimate.

Are cheaper models always better for high-volume apps?

No. A cheaper model is better only if it completes the task reliably. If it fails often, a more expensive model can be cheaper per successful result.

How do I reduce LLM API costs quickly?

Reduce prompt length, cap output length, route simple tasks to smaller models, add caching, and track cost by feature.

Should I compare providers manually or use a gateway?

Manual testing is fine at first. A gateway is better once you need ongoing routing, logging, quotas, and fallback across multiple providers.

Final thoughts

LLM API pricing is not a spreadsheet exercise. It is an operational discipline. The right question is not "which model has the lowest token price?" It is "which model completes this task at the lowest reliable cost?"

Measure real workloads, track cost per successful task, and use routing to match each request to the most efficient model.