Compare 30+ LLMs by price, speed and real-task performance
A developer decision table for model cost, TTFT, context, Chinese coverage and task-specific quality. Phase 1 ships the usable interface and data pipeline contract.
Model price and latency table
The demo-v2-design-4 clean table style, extended into a real decision surface.
| Model | Input | Output | TTFT | Context | Value | Updated | |
|---|---|---|---|---|---|---|---|
| DSDeepSeek V3DeepSeek · closed | $0.14/1M | $0.28/1M | 124ms | 128K | 96 | 2026-06-09 | Compare |
| QWQwen 2.5 72BAlibaba Cloud · open | $0.35/1M | $0.70/1M | 156ms | 128K | 91 | 2026-06-09 | Compare |
| G4GPT-4oOpenAI · closed | $2.50/1M | $10.00/1M | 89ms | 128K | 73 | 2026-06-09 | Compare |
| C3Claude 3.5 SonnetAnthropic · closed | $3.00/1M | $15.00/1M | 95ms | 200K | 70 | 2026-06-09 | Compare |
| GMGemini 2.0 FlashGoogle · closed | $0.10/1M | $0.40/1M | 112ms | 1M | 94 | 2026-06-09 | Compare |
| DBDoubao ProVolcano Engine · closed | $0.11/1M | $0.22/1M | 141ms | 128K | 95 | 2026-06-09 | Compare |
| KMKimi K2Moonshot AI · closed | $0.18/1M | $0.72/1M | 168ms | 200K | 88 | 2026-06-09 | Compare |
| GLGLM-4 PlusZhipu AI · closed | $0.80/1M | $0.80/1M | 184ms | 128K | 78 | 2026-06-09 | Compare |
Task bucket leaders
Ten task buckets use sample scores to validate routing and ranking logic.
Repository edits, bug fixes, unit-test reasoning and API usage.
85DeepSeek V3
WritingProduct copy, long-form prose, editing and tone control.
86DeepSeek V3
TranslationBidirectional English/Chinese translation with terminology consistency.
86DeepSeek V3
MathWord problems, algebraic reasoning and structured calculation.
85DeepSeek V3
Tool CallingJSON output, function selection and multi-step tool use.
85DeepSeek V3
ChineseChinese writing, knowledge, instructions and domestic model coverage.
89DeepSeek V3
Phase-1 core flows
The homepage leads directly to routing, comparison, alerts and methodology.
Router
Paste a prompt and get Top 3 recommendations by cost, speed and quality.
Compare
Compare price, TTFT, context, task scores and provenance side by side.
Get price alerts
The phase-1 form returns a local API confirmation before email storage is connected.
Provider coverage
Chinese and global providers are shown in one surface to avoid English-only benchmark bias.
DeepSeek
2 models · value score 90
Alibaba Cloud
3 models · value score 93
OpenAI
3 models · value score 87
Anthropic
3 models · value score 67
3 models · value score 87
Volcano Engine
3 models · value score 94
Moonshot AI
2 models · value score 89
Zhipu AI
3 models · value score 84
Meta
3 models · value score 86
Mistral AI
3 models · value score 82
xAI
1 models · value score 69
Cohere
2 models · value score 75
Trust layer before growth tricks
Every row is designed to carry update time, source type, benchmark version and methodology links before this becomes a production data service.
Open methodology
Each task bucket describes prompts, sampling, scoring and update cadence.
Visible freshness
Every data point carries update time, source type and benchmark version.
Cost control
Benchmark jobs reserve budget caps, caching, sampling and monthly cost fields.
Reports
Initial content focuses on methodology, Chinese model selection and routing cost.
June sample ranking notes
How the phase-1 sample snapshot is shaped and what still needs real benchmarking.
GuideSelecting Chinese LLMs for product teams
A practical comparison frame for Qwen, DeepSeek, GLM, Doubao and Kimi.
CostA starter playbook for reducing LLM API bills
Route simple, latency-sensitive and hard reasoning tasks to different model tiers.