StratecheryMay 3, 2026

The Agentic Inflection

代理式推理的拐点

For the past two years, the conventional wisdom about LLMs has been that they are tools — clever predictors of the next token, useful for drafting, summarizing, and the occasional code completion. That framing is decisively outdated.

过去两年里,关于 LLM 的常识一直是:它们只是工具——一个聪明的下一 token 预测器,可以用来起草、总结,偶尔补代码。这个框架已经彻底过时了。

What changed

改变是什么

What changed is not the model itself — though Kimi K2 and the Claude 4.7 family are clearly more capable — but the harness that surrounds it. Tool use, persistent memory, and long-running agents have moved from research demos into production primitives.

改变的不是模型本身——尽管 Kimi K2 和 Claude 4.7 家族确实更强——而是包裹它的运行时框架。工具调用、持久化记忆和长时运行的 agent,已经从研究 demo 进入了生产级原语。

The model is the kernel. The harness is the operating system. The next decade of AI is an operating-system war.

模型是内核。运行时框架是操作系统。AI 的下一个十年,是一场操作系统战争。

Three signals worth tracking

值得追踪的三个信号

·Latency budgets — agents that loop ten times still need to feel snappy

·延迟预算——一个会循环十次的 agent,体感上仍然要显得敏捷

·Cost per task — not cost per token; the meaningful unit is task completion

·单任务成本——不是每 token 成本;真正有意义的单位是「任务完成」

·Failure isolation — a single bad sub-step should not poison the whole run

·失败隔离——单个出错的子步骤,不应当污染整次运行

These three are the load-bearing constraints of the next generation of agentic products. Underweight any one of them and you ship a demo, not a product.

这三条是下一代代理式产品的承重墙。任何一条权重不够,你交付的就是 demo,不是产品。