Back to stories
Models

Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps

Michael Ouroumis3 min read
Moonshot Kimi K2.6 lands open-source, scales to 300 sub-agents and 4,000 coordinated steps

Moonshot AI dropped the "Preview" label on its newest Kimi model this week and made Kimi K2.6 generally available on April 20, 2026, publishing weights to Hugging Face under a Modified MIT License and rolling the model out across Kimi.com, the Kimi App, the official API, and the Kimi Code CLI. The release lands eight days after beta testers first ran K2.6 Code Preview — and it arrives with benchmark numbers that, if they hold up in independent testing, put an open-weight Chinese model ahead of the top frontier closed models on agentic coding.

A trillion-parameter MoE tuned for long-horizon work

K2.6 keeps the 1-trillion-parameter Mixture-of-Experts backbone that has defined the K-line since mid-2025: 32 billion active parameters per token, 384 experts with eight activated plus one shared expert per step, 61 layers (including one dense layer), 64 attention heads with Multi-head Latent Attention, and a 160K-token vocabulary. The context window is 256,000 tokens, with automatic compression that summarizes earlier turns so marathon sessions don't degrade into lossy recall. A 400M-parameter vision encoder called MoonViT handles multimodal inputs.

On published benchmarks, Moonshot reports K2.6 at 58.6 on SWE-Bench Pro, compared to 57.7 for GPT-5.4, 53.4 for Claude Opus 4.6 at max effort, and 50.7 for K2.5. On Humanity's Last Exam with tools, K2.6 posts 54.0 against 52.1 for GPT-5.4 and 53.0 for Opus 4.6. On DeepSearchQA, Moonshot claims 92.5 F1 against 78.6 for GPT-5.4.

300-agent swarms and days-long autonomy

The headline capability is agentic: K2.6 pushes the Agent Swarm cap to 300 sub-agents with up to 4,000 coordinated steps, up from 100 sub-agents and roughly 1,500 steps in K2.5. Moonshot documented two long-horizon case studies — a 12-hour optimization run on a Zig codebase that moved a throughput metric from 15 to 193 tokens per second, and a 13-hour financial-engine overhaul that boosted medium throughput 185 percent, from 0.43 to 1.24 MT/s. The company also says proactive agents can run autonomously for up to five days.

K2.6 ships with a dual inference profile — a slower "Thinking mode" for chain-of-thought work and an "Instant mode" tuned for low-latency front-end tasks — plus a Skills feature that turns PDFs, spreadsheets, and slide decks into reusable task templates, and "Claw Groups" for mixed human-agent collaboration across devices.

Why this release matters

The release targets, in Moonshot's words, "practical deployment scenarios: long-running coding agents, front-end generation from natural language, massively parallel agent swarms coordinating hundreds of specialized sub-agents simultaneously." Translated: Moonshot is pitching K2.6 as the production-grade option for developer teams that want an open-weight alternative to Anthropic's Claude Code stack and OpenAI's Codex for overnight, autonomous engineering work.

The strategic wrinkle is licensing. K2.6 ships under a Modified MIT License with weights on Hugging Face — meaning enterprises can self-host it, fine-tune it, and avoid per-token exposure to U.S. cloud providers. Combined with Z.ai's GLM-5.1 release earlier this month, it's another sign that the open-weight gap to frontier Western labs on agentic coding is effectively closed on reported benchmarks. The real test comes next: whether third-party evaluations, and long-running customer deployments, confirm the 58.6 SWE-Bench Pro score outside Moonshot's own harness.

Learn AI for Free — FreeAcademy.ai

Take "AI Essentials: Understanding AI in 2026" — a free course with certificate to master the skills behind this story.

More in Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support
Models

xAI Launches Grok Voice Think Fast 1.0, Tops τ-Voice Bench and Powers Starlink Support

xAI's new voice model scored 67.3% on the τ-voice Bench — well ahead of Gemini 3.1 Flash Live and GPT Realtime — and is now powering Starlink's phone sales and support with a 70% autonomous resolution rate.

2 days ago2 min read
Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao
Models

Tencent Drops Hy3 Preview: 295B Open-Source MoE Model Kicks DeepSeek Out of Yuanbao

Tencent has open-sourced Hy3 Preview, a 295B/21B-activated mixture-of-experts model built in under three months. The Yuanbao chatbot is switching its primary engine from DeepSeek to the new in-house model.

4 days ago2 min read
DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M
Models

DeepSeek V4 Preview Lands: 1.6T-Parameter Open Model With 1M Context, Flash Pricing at $0.14/M

DeepSeek on April 24 released preview versions of V4-Pro and V4-Flash, an open-weight MoE family with a 1M-token context window and pricing that undercuts Western frontier labs.

4 days ago2 min read