Coding plan comparisons based on actual usage
Measuring AI coding plans vs API pricing. Codex is subsidized ~27×, most others ~8×, and Claude Pro still costs ~10× more per token than the rest.
Coding plans are now the default way to use frontier models for a lot of people, and the gap between the frontier models and open-weight models is narrowing each week — Kimi 2.6 and MiMo Pro score 54 on the Artificial Analysis Index, compared to 57/60 for Opus-4.7/GPT-5.5.
It is obvious that coding plans are the cheapest way to access the most intelligent models. But how does the pricing compare?
Blended subscription rate*
| Provider | $ / M blended tokens |
|---|---|
| MiniMax 2.7 | $0.004 |
| Kimi 2.6 | $0.047 |
| GLM 5.1 (Lite) | $0.065 |
| Codex (GPT-5.5) | $0.080 |
| MiMo V2.5-Pro | $0.141 |
| Claude Pro (Opus 4.7) | $0.744 |
* Subscription cost ÷ monthly tokens delivered, on a blended Claude-Code workload. See the methodology note at the bottom.
The usage these plans provide is intentionally obfuscated and likely played around with depending on supply and demand. I make an effort here to measure and snapshot what you actually get on each plan. This data is from May 1st, 2026.
I proxy each request through a server that logs, and measure input tokens, thinking tokens, output tokens, and calculate the price it would cost to use these models directly from the API.
Two notable absences: DeepSeek v4 doesn’t offer a subscription plan — only pay-as-you-go API access. Gemini has yet to launch a serious coding plan; Code Assist on the OAuth path is request-capped at the free tier and meters by request count rather than tokens.
| Subscription | $/mo | 5h cap | Weekly cap | Sessions/wk | Monthly cap | API-$ / sub-$ | Monthly tokens | Tokens / sub-$ |
|---|---|---|---|---|---|---|---|---|
| Claude Pro — Opus 4.7 | 20.00 | $4.75 | $38 | 8.0× | $152 | 7.6× | 26.9 M | 1.35 M |
| Codex (GPT-5.5) | 20.00 | $22 | $134 | 6.0× | $536 | 26.8× | 250 M | 12.5 M |
| Kimi 2.6 (for Coding) | 20.00 | $8.50 | $40 | 4.7× | $160 | 8.0× | 423 M | 21.1 M |
| GLM 5.1 (Lite) | 18.00 | $7.70 | $34.60 | 4.5× | $138 | 7.7× | 275 M | 15.3 M |
Two more plans don’t fit the rolling-$-cap pattern, so I split them out. MiniMax meters request count instead of tokens, and MiMo gives a flat monthly credit pool with no rolling 5h or weekly windows.
| Subscription | $/mo | Weekly cap | Monthly cap | API-$ / sub-$ | Monthly tokens | Tokens / sub-$ |
|---|---|---|---|---|---|---|
| MiniMax 2.7 | 20.00 | 45,000 req | $675 (at 30K tok/req) | 33.8× | 5,400 M | 270 M |
| MiMo V2.5-Pro (token-plan-sgp) | 14.08 | — | 200 M credits / $14.08 | 1.0× | 100 M | 7.1 M |
Speed
Opus 4.7 in Claude is the most expensive option in subscription plans, but possibly one that costs lesser in user frustration. I personally reach for it again and again for two reasons:
- It understands my intention and direction I want to go in and gets working on it.
- It is also the fastest to execute as is seen from the table below.
| Provider | n | TTFT avg | Out avg | TPS avg | TPS max |
|---|---|---|---|---|---|
| anthropic | 23 | 2244 ms | 1306 | 82.3 | 159.6 |
| zai-glm | 324 | 5097 ms | 5659 | 72.9 | 107.5 |
| xiaomi-mimo | 10 | 2791 ms | 1182 | 62.9 | 82.6 |
| minimax | 26 | 2048 ms | 787 | 53.9 | 88.1 |
| kimi-coding | 148 | 3848 ms | 863 | 50.9 | 89.8 |
| openai-chatgpt | 159 | 1566 ms | 815 | 46.1 | 54.5 |
A quick practical tip to maximize usage. I used Claude Code as the harness and in using Kimi and MiniMax API’s, I didn’t have a faster, smaller model to work with. I instead routed every Haiku level call to deepseek-v4-flash. I loaded $2 for all of these experiments and I still have most of it.
A caveat on scope. Something that I haven’t tested for is using these subscriptions outside of coding harnesses. Most of them support being used in OpenClaw/Hermes setups other than Claude Code. While I was able to use the APIs of all the chinese providers for testing purposes, your mileage may vary and they may strike your account with a ban.
A note on blended usage. From proxy logs, an average Claude Code call breaks down as 92.4% cache_read · 5.2% output · 2.4% fresh input. Each provider prices these axes differently, so I apply each one’s per-axis pricing to the same workload — all the numbers above sit on identical request shapes.