The Agentic Inflection
代理式推理的拐点
For the past two years, the conventional wisdom about LLMs has been that they are tools — clever predictors of the next token, useful for drafting, summarizing, and the occasional code completion. That framing is decisively outdated.
过去两年里,关于 LLM 的常识一直是:它们只是工具——一个聪明的下一 token 预测器,可以用来起草、总结,偶尔补代码。这个框架已经彻底过时了。
What changed
改变是什么
What changed is not the model itself — though Kimi K2 and the Claude 4.7 family are clearly more capable — but the harness that surrounds it. Tool use, persistent memory, and long-running agents have moved from research demos into production primitives.
改变的不是模型本身——尽管 Kimi K2 和 Claude 4.7 家族确实更强——而是包裹它的运行时框架。工具调用、持久化记忆和长时运行的 agent,已经从研究 demo 进入了生产级原语。
The model is the kernel. The harness is the operating system. The next decade of AI is an operating-system war.
模型是内核。运行时框架是操作系统。AI 的下一个十年,是一场操作系统战争。
Three signals worth tracking
值得追踪的三个信号
·Latency budgets — agents that loop ten times still need to feel snappy
·延迟预算——一个会循环十次的 agent,体感上仍然要显得敏捷
·Cost per task — not cost per token; the meaningful unit is task completion
·单任务成本——不是每 token 成本;真正有意义的单位是「任务完成」
·Failure isolation — a single bad sub-step should not poison the whole run
·失败隔离——单个出错的子步骤,不应当污染整次运行
These three are the load-bearing constraints of the next generation of agentic products. Underweight any one of them and you ship a demo, not a product.
这三条是下一代代理式产品的承重墙。任何一条权重不够,你交付的就是 demo,不是产品。