The case for the data layer: the part of AI most firms forget to build

Written by Jo Goodwin | May 26, 2026 3:22:16 PM

This is the second in a three-part series. Start here: Where does proprietary value live in a frontier AI world?

Four reasons the data layer has to come first

Most conversations about AI in private equity start in the wrong place. They start with the model — which one to use, how capable it is, how quickly it can reason over data. But the quality of any AI output is only as good as the context it works from. And in private equity, the most valuable context is not available on the web, not contained in any data provider's feed, and not something a frontier model can reason its way to.

It exists in the accumulated history of how your firm invests.

1. The problem is fragmentation, not lack of data

Most firms do not have a data problem in the conventional sense. They have more data than they can usefully process. What they have is a fragmentation problem. Signals sit in CRMs that were never designed to capture investment thinking. Decisions live in email threads and meeting notes that no one can easily retrieve. Market intelligence is subscribed to, used once, and forgotten. Junior analysts leave and take their context with them. The firm keeps moving, but it does not compound.

A well-built data layer addresses this directly. It ingests information from multiple sources, maps it to a common structure, and makes it retrievable in a way that is useful rather than merely present. When an analyst researches a company today, they are working from everything the firm has ever known about that company — not just what they can find themselves in the time available.

2. A structured data layer changes what AI can do

A frontier model connected to a governed, structured data layer behaves differently than one working from a blank slate. It can surface prior analysis. It can flag that a colleague has already reviewed a business. It can note that the firm passed on a comparable company three years ago and recall why. The model's reasoning does not change. But its starting point does, and that changes everything.

Direct integrations between AI models and data providers are genuinely useful. But they are not a substitute for institutional memory. A model that can access a data provider's feed in real time still does not know what your firm has done with that data — which signals you acted on, which companies you passed on and why, or what your partners concluded in their last portfolio review. That knowledge does not live in any vendor's system. It lives in yours, if you have built somewhere for it to live.

3. Governance is not optional when AI is making decisions

When AI agents begin operating autonomously across deal workflows — accessing data sources, generating analysis, surfacing recommendations — the question of what they knew, when, and why they acted becomes a compliance requirement, not just a technical preference. A firm that cannot answer those questions is exposed. A firm that has built a governed data layer, where access is mediated, decisions are logged, and sensitivity tiers are enforced in infrastructure rather than prompts, is not.

4. The data layer is not the whole answer — but nothing works without it

It is the foundation everything else stands on. Without it, the rest of the architecture has nothing to work from.

Next in the series will be 'The case for the orchestration layer: why data alone isn't enough'

View full post