CalvinBall

Why enterprise AI is about to split into two categories — and why only one of them compounds.

A top-five consumer goods company showed us a slide last quarter they were proud of: forty-two AI tools in production. Copilots for every function. Retrieval across every document store. A generative-AI centre of excellence with a real budget and a real headcount plan.

We asked one question: how many decisions has your company made differently this year because of any of it?

The room went quiet. The honest answer was "we don't know." The real answer, which everyone in the room suspected, was "almost none."

Three years into the enterprise-AI wave, the pattern is now unmistakable. The Global 2000 has spent tens of billions of dollars on tools that help knowledge workers retrieve, summarise, and draft. Not one of those tools has changed how any of those companies actually make a decision. The board is starting to notice. The budget is starting to tighten. And a question is forming in every AI governance committee from São Paulo to Seoul: what were we actually buying?

The answer is that enterprise AI has been selling the wrong category. The first wave gave every knowledge worker a copilot. The next wave — the one that compounds, the one that produces ROI a CFO can defend — does not help the knowledge worker. It is the knowledge worker.

It takes the decision the human used to make and delivers the decision itself, priced against the headcount it replaces, governed by the expertise it encodes, and architected so that enterprise data never leaves the enterprise boundary.

This is not a better copilot. It is a different category. It has a different architecture, a different buyer, a different unit economics, and a different moat.

We have been building it in stealth for two years. We call the platform CalvinBall, the products playbooks, and the moat The CPG Brain — two years of industry context, co-created with companies like Unilever, LVMH, and Colgate, and governed by a council of a dozen CPG industry captains that we call the Architects. The Architects anchor our domain expertise and also form the core of the angel capital we have raised to date. The personal references of the Lead Architect is published alongside this essay.

This piece is the public version of what we have learned. The fourteen-page whitepaper with the architectural spec and the case evidence is linked at the bottom.

IWhy the copilot wave stalled

The first wave of enterprise AI was built on a seductive but wrong assumption: that a sufficiently capable foundation model, wired into enterprise data, would eventually be able to do the work. It cannot. Not because the models are weak. Because enterprise work is not a knowledge problem.

A brand manager at a consumer goods company does not need a better way to find the Q3 promotion deck. She needs a system that tells her, at 8 a.m. Monday, unprompted, that the promotion she launched in New York last week is cannibalising her hero SKU in Boston and here are the three reallocations that would recover 4.2 margin points by quarter-end. One of those is retrieval. The other is a decision. The market has sold forty-two of the first kind. It needs one of the seconds.

The structural reason is that for thirty years enterprise software has lived under a contract: the software holds the data, the human holds the logic. ERP, CRM, data warehouses, BI dashboards — all of them are elaborate machines for putting information in front of a human who then applies judgement. The judgement lives in people's heads, in tribal knowledge, in the seventeen-year category manager who knows which promotions work in Ramadan.

LLMs broke this contract in exactly one direction. They made the retrieval of information conversational. They did not capture the logic.

A foundation model knows the entire internet and nothing about how your company decides whether to accept a trade-spend increase from a distributor in Jakarta.

That judgement — the reasoning that turns information into a decision — is the work. Everything else is overhead. A horizontal copilot will never carry it, because it is not text that can be retrieved. It is domain structure that has to be built, industry by industry, by the people who have actually run the industry.

IIThe horizontal ceiling, and what Harvey already proved

The obvious objection to what we are saying is: won't the foundation models eventually do this themselves?

The answer is no, and the reason is not technical. It is economic.

A horizontal copilot, to capture industry-specific decision logic, would have to build the context graph for every industry it serves. Law. Consumer goods. Pharma. Energy. Financial services. Healthcare. That is not a product roadmap; it is a fifty-year research programme. Horizontal players will not do this, because the moment they commit to depth in one industry they lose the scale economics that make them horizontal. Their business model forbids it.

Harvey recognised this first, for law. They did not build a better legal-research copilot. They built the thing that replaces the junior associate, priced it against the junior associate's salary, and sold it into firms whose partners had already run out of patience with generalist tools. They won the category not because they had better models — they did not — but because they had the depth a horizontal cannot afford to build.

The question for every other industry is the same: who builds the depth first, and how long is the window? The firms that still believe a horizontal copilot will eventually understand their industry will look, in five years, like the firms that in 2015 still believed their in-house data team could keep up with a cloud data warehouse.

We have chosen consumer goods as the first industry, deliberately, for reasons we will come to.

IIIWhat we built: the platform

CalvinBall is a platform, not a product. Three tiers, each of which does a thing the current enterprise stack cannot.

IRIS is the ingestion tier. It takes in the messy middle of enterprise data — the PDFs, the Excel sheets, the category manager’s working file, the distributor’s WhatsApp screenshot — alongside the structured feeds from ERP, DMS, Google, Meta, Nielsen, Kantar. It produces a usable representation of the enterprise that the rest of the stack can reason over. Today, most of a $1 trillion global spend on “human middleware” — the analysts, consultants, and systems integrators who bridge raw data and executive decisions — exists to do, badly, what IRIS does natively.

MISL — the Machine Intelligence Semantic Layer — is the reasoning tier. It is the custodian of truth for the enterprise: defined metrics, named dimensions, governed business logic, all built once and served deterministically to every downstream engine. MISL is deliberately hybrid. Symbolic where decisions must be auditable; neural where pattern-matching adds value. At the core of MISL is a Federated Supervisor — the component that enforces a complete architectural separation between your enterprise data and the system's own learning. The system becomes smarter over time, but it does so on anonymised interaction patterns, not on your data. Your data stays within your enterprise boundary, by design, not by promise.

Tapestry is the orchestration tier and the deterministic interface to the entire platform. Every decision Tapestry produces is traceable, versioned, and defensible to an auditor. It binds IRIS and MISL into deployable playbooks and composes those playbooks into end-to-end decision flows. New-product launch touches demand planning, trade promotion, pricing, and supply. Tapestry routes the context and the logic across all four, handles the human-in-the-loop escalations, reconciles conflicts, and enforces the policies the organisation has chosen to keep human.

The three tiers together are the platform. They are what enables us to build the next industry in months rather than years, and they are why every additional customer makes the next one cheaper and faster to deploy — the economics of software, not of services.

IVWhy private-by-architecture changes the conversation

Every CAIO we have worked with in the last eighteen months has the same private worry. They have read enough fine print to know that "private endpoint" and "will not train on your data" are two very different claims, and that the distance between them is where careers end. They are right to worry. Most enterprise AI deployments, even behind a private endpoint, leak enterprise context into the vendor's training pipeline in ways the procurement team will not see until an audit surfaces it eighteen months later.

The Federated Supervisor is our architectural answer to this. It is not a policy. It is not a contractual promise. It is the piece of the system that makes the leak technically impossible.

Trust is not a promise the vendor makes. Trust is a property of the architecture.

The reinforcement-learning loop that makes CalvinBall smarter runs on anonymised patterns of how the system was used — not on the data itself, not on the decisions, not on the customer-specific logic graphs. The compounding asset we are building — the thing that makes CalvinBall more valuable to every customer, every month — is governed by the Supervisor and lives outside any individual enterprise's data boundary.

Two practical consequences follow. First, the compliance conversation gets shorter. A regulated-industry CFO can approve CalvinBall in a timeline that is measurably faster than the equivalent procurement cycle for a horizontal copilot of equivalent scope. Second, and more importantly, the question of trust stops being a question. The architecture answers it, and the answer is verifiable at the code level by any enterprise security team that wants to audit it.

A horizontal copilot will argue, forever, about its data posture. We have taken the argument off the table by design.

VThe moat: The CPG Brain

The platform is not the moat. The platform is the minimum competitive ticket.

The moat is two years of industry context no horizontal player will ever afford to build. Call it The CPG Brain. It has two components, and we built them in stealth on purpose.

The first component is the decision grammar we co-created with three of the world’s largest consumer goods companies — Unilever, LVMH, Colgate. Embedded work. Not a pilot, not a proof-of-concept: The client’s own category managers, insight leads, and commercial heads helped define, version, and stress-test the logic graphs we had created using domain experts that now sit in MISL. No amount of LLM capability substitutes for a category director (the user) sitting with an engineer and explaining, for the fourth time, why the Ramadan promotional cycle breaks three of the assumptions the graph was designed around.

The second component is the Architects — Fortune 500 CPG leaders who have run the industry from the inside. Each has contributed personal expertise to the design of the platform. Among the Architects: Dharnesh Gordhon. Antoine de Carbonnel. James Lafferty, David Tan. Four Forbes Top 100 global CEOs from P&G, BAT, and Coca-Cola. They meet monthly as a council. They open doors — which is why CalvinBall’s fastest enterprise close is four months, against an industry average of eighteen. And, critically, they put their own names and their own capital behind the thesis. The Lead Architects personal reference describing in their own words why they invested, are published alongside this essay. If you are unsure whether the category is real, read them before you read us.

Together, the co-creation work with infused domain expertise and the Architects are the CPG Brain. A horizontal player cannot build this without choosing to become us. Their business model will not let them.

VIWhat ships on top: the playbooks

CalvinBall’s platform products are not features. They are playbooks — units of industry expertise packaged as software, sold against the human headcount they replace. Four ship today. Each is named for the outcome it delivers, not the function it performs.

Brief — the executive briefing. The morning pre-read for the leadership team. Replaces the weekly work of the BI analyst who manually assembles the deck. A global CPG giant cut that team's 17-person reporting burden and regained the strategic focus the team had lost.

Ask — the concierge. Natural-language access to every signal in the business, disciplined by MISL's schema so it cannot hallucinate relationships that do not exist. A leading Indonesian CPG compressed a seven-day analyst queue into a one-hour conversation, saving 4,000+ human hours per month across its sales intelligence function.

Why — the diagnostic. When something moves, “Why” answers the question every CEO eventually has to ask. Root-cause reasoning across channel, SKU, region, and period, traced back to the data and the logic that produced it. India’s #2 FMCG saw insight generation collapse from seven days to one hour, with a 4.62× ROI projected in the first year.

Signal — the early-warning system. Always-on monitoring of the metrics that actually matter, tuned by the client's own logic, surfacing anomalies before they become quarter-end problems. In one deployment, Signal converted quarterly distribution interventions into a real-time early-warning system worth $10M of value creation over three years.

Brief, Ask, Why, Signal. That is the rhythm of an executive's day — the morning brief, the daytime question, the diagnostic when something breaks, the always-on alert. The product is the outcome. The feature work disappears behind it.

The business model follows. We are not priced against a software budget. We are priced against the human work we displace. Sequoia called this services-as-software: the software doesn't assist the service, it becomes the service. The category is priced that way, measured that way, and it compounds that way.

VIIWhy CPG first, and what comes next

Every horizontal AI company is chasing "enterprise." We started with consumer goods on purpose.

CPG is the hardest tractable problem in applied AI. Thousands of SKUs. Tens of thousands of outlets. Daily promotional decisions. Volatile demand. Physical goods that cannot be undone. Feedback loops measured in weeks, not seconds. If the architecture works in CPG — with razor margins, fragmented channels, and distributors who communicate by WhatsApp — it works everywhere downstream.

The roadmap reflects this. CPG is the beachhead. The Architects council in CPG is the template; we will stand up the equivalent in two more industries in the near future such as Pharma commercial operations or Industrial distribution. The next Brain gets built the way the first one did: deep embeds with a named anchor client, governed by an industry-specific Architects council, on top of the platform that already exists.

Test your own AI portfolio Five questions. Ninety seconds. Scored zero to five. Honest answers about what your current AI investment is actually producing.

Take the diagnostic →

VIIIThe diagnostic, for the reader who wants to test us

Before you close this tab, take ninety seconds to test your own AI portfolio against ours.

Name five decisions. Can you name five decisions your company made differently in the last twelve months because of any AI tool you own? Not five deployments. Not five satisfaction scores. Five decisions whose outcome changed. If not, you have a knowledge portfolio, not a decision portfolio.
Point to the reasoning. For any AI tool in your stack that claims to support a decision, can you point to the chain of reasoning it used — and can a business owner edit that reasoning without calling engineering? If not, it is not a decision tool regardless of how it was sold.
Test composability. Take two related decision classes in your company — trade promotion and pricing exception, credit approval and vendor onboarding. Can the reasoning from the first be reused in the second? If every decision class requires a ground-up rebuild, you are not building an asset; you are running a project portfolio.
Check ownership. What percentage of your AI tools are owned and maintained by business users rather than engineering, central IT, or an outside vendor? Under 30%? Your AI investment is not compounding. Every new decision class is a new project.
Audit the audit trail. Pull a decision your AI supported in the last 30 days and reconstruct, from the tool's outputs alone, why it recommended what it did. If this takes more than 15 minutes, the tool is not production-grade for any decision that matters.

If the honest answers make you uncomfortable, the problem is architectural. Not vendor-specific, not model-specific. Architectural. We would not be publishing this essay if we did not believe we were building the architecture that replaces the current one.

IXWhat we're asking for

We are not looking for five hundred readers. We are looking for the first fifty who recognise what we are describing.

If you are a CPG leader whose own diagnostic just came back uncomfortable, reach out. We are opening a small number of Architect-introduced conversations over the next quarter. The Brain is not a demo; it is an eighteen-to-twenty-four-month partnership. We are selective about whom we build with, and the Architects help us decide.

If you are an institutional investor focused on enterprise infrastructure or applied AI, the whitepaper is the document worth reading. The case evidence, the architectural spec, and the moat analysis are in it. Institutional conversations are open; we are being deliberate about who we build the next twelve months with.

If you are a builder working on the vertical-AI thesis in another industry, we would like to meet you. There is a Harvey-sized opportunity in at least a dozen industries. The companies that catch this turn will compound for a decade. The ones that do not will be explaining to their boards why their copilot portfolio was the wrong bet.

— Gurnoor Dhillon, Sandeep Ramesh, Axel Wehr, Emeka Kalu-Uma. CalvinBall. May 2026.

CalvinBall is inspired by the beloved Calvin & Hobbes comic strip, and a game in which the only rule is that… the rules keep changing — which is, as it happens, also what enterprise AI has looked like for three years. We believe the next chapter has rules. We are writing them.

The First Wave Sold You a Copilot. The Next Wave Sells You The Decision.