Sample data: Phase-1 sample snapshot. Official crawling and weekly benchmark jobs are not connected yet. All price, latency and score values validate the product structure only and must be replaced by traceable production data before launch.

Model comparison

The `models=a,b,c` URL parameter already drives the comparison page; selectors and saved comparisons come next.

ModelInputOutputTTFTContextValueUpdated
QWQwen 2.5 72BAlibaba Cloud · open$0.35/1M$0.70/1M156ms128K912026-06-09Compare
DSDeepSeek V3DeepSeek · closed$0.14/1M$0.28/1M124ms128K962026-06-09Compare
QW

Qwen 2.5 72B

Alibaba Cloud · Chinese-language performance is weighted higher in the Chinese task bucket.

Quality
84
Chinese
95
DS

DeepSeek V3

DeepSeek · Strong value baseline for coding and Chinese tasks in the sample set.

Quality
86
Chinese
93