Measurement

Edition No. 052/June 13, 2026/4 min read

Incrementality testing without a data team

Incrementality has a reputation as a problem for companies with a data science function and a nine-figure budget. The cheap version answers the only question that matters, and any team can run it next quarter.

April Y.Partner, Performance & Connections

Most growth teams have never run an incrementality test, and the reason they give is always some version of the same thing: we do not have the data scientists, the spend, or the tooling for that. It sounds like a resource constraint. It is actually a misunderstanding of what incrementality is for.

Incrementality answers one question — if we turned this spend off, how much of the revenue would still happen? The expensive version of that answer requires modeling. The cheap version requires only the discipline to withhold something and the patience to measure what you lost. Any team running more than a token budget can do the second one inside a single quarter.

What platform-reported numbers actually count

The dashboards are not lying, exactly. They are answering a different question than the one you are asking. A platform reports the conversions it can attribute to an ad someone saw or clicked. It cannot report the conversions that would have happened anyway — the customer who was going to buy, searched your brand, clicked the ad sitting on top of the organic result, and got counted as a paid acquisition.

That gap between reported conversions and incremental conversions is not a rounding error. On brand search, on retargeting, and on any audience already deep in the funnel, it can be most of the reported number. A program can look like it is returning four dollars for every one spent and be returning closer to one, because three of those four dollars were going to arrive without the ad. Incrementality is the only measurement that separates the work the spend is doing from the credit the spend is taking.

The three tests a small team can actually run

None of these require a data science function. They require an agreement to give up some attributed revenue for a few weeks in exchange for knowing the truth.

The geo holdout — Split comparable markets into a test group that keeps spending and a control group that goes dark. Run it long enough to clear the purchase cycle, then compare total revenue — not platform-reported revenue — between the groups. The difference is incremental lift. Designated market areas make this clean for national advertisers; for local businesses, matched cities or regions work.
The scaled-back holdout — When going fully dark is too frightening, cut spend on a channel by a large, deliberate amount in part of the footprint and hold it flat elsewhere. You lose resolution but keep most of the signal, and the political cost inside the company is lower.
The platform conversion-lift test — The ad platforms will run a randomized holdout for you, showing the ad to a test group and withholding it from a control group, then reporting the measured lift. It is the least work and the most conflicted — the platform is grading its own homework — so treat it as one input, not the verdict.

Designing the test so the answer is usable

The failure mode is not running the test. It is running it badly enough that the result is ambiguous, which lets everyone keep believing what they believed before.

A usable test holds the geographies comparable, runs long enough to clear the lag between exposure and purchase, and is sized so the expected lift is larger than the week-to-week noise in the business. A brand with smooth, high-volume sales can detect a small lift quickly. A brand with lumpy, seasonal sales needs a longer window and a bigger spend swing, or the signal drowns in the variance. The honest move is to decide, before the test starts, what size of effect would change the decision — and to make the test big enough to see an effect that size.

A measurement you would not act on is not measurement. It is reassurance with a chart attached.

What the first test usually reveals

Teams that run their first geo holdout tend to learn two things at once. The channels they were proudest of — the ones with the gaudiest reported return — are frequently the least incremental, because attribution rewards channels that sit closest to the purchase. And a channel someone had been threatening to cut turns out to be doing quiet upper-funnel work that only shows up when you withhold it and watch demand soften a month later.

That reordering is the entire point. The reported numbers rank your channels by how good they are at claiming credit. Incrementality ranks them by how much they actually cause. Those two rankings are rarely the same, and the budget should follow the second one.

Closing

The barrier to incrementality was never the math or the headcount. It is the willingness to give up a few weeks of attributed revenue to learn what the spend is really doing, and the discipline to design the test so the answer is unambiguous enough to act on. A team that runs one honest holdout per quarter — rotating across its major channels over a year — will allocate capital better than a team with ten times the tooling and no appetite to ever turn anything off.

Written by

April Y.

Partner, Performance & Connections

Leads paid media, growth intelligence, and connection planning. Builds the LTV models, MMM rebuilds, and incrementality frameworks that anchor AYMI's measurement work. Writes about the finance literacy gap in marketing.

One dispatch a month. No filler.

New journal pieces in your inbox the day they ship. Unsubscribe in one click; we keep the list short and the cadence honest.

See the work

Fourteen case studies, measured to the metric that matters.

/work →

How it gets done

The Method — five movements that ship every engagement.

/method →