← Back to Articles

You Cannot Backtest the Future

Typhoon Sinlaku, April 2026 — Mariana Islands

Alt-data assumes the future will look like the past. When that assumption breaks, the signals fail — exactly when funds need them most.

Two weeks ago at the Alternative Data Conference in London, a commodities quant said the line I have not been able to shake: "You cannot backtest something when there was no past." I wrote it up here.

It is the clearest explanation I have heard of why much of alternative data, after 15 years of running the table, has run out of road.

Card spend, satellite imagery, foot traffic, every classic alt-data signal works on one assumption: The future will look enough like the past for the historical relationship to hold. That used to be a safe bet. It is not anymore. Market-altering events are arriving faster than markets can react to them, and historical data goes blind in the gap — exactly when funds need it most.

So what do you do instead?

Synthetic research. It is the only category of data that lets funds react to market-altering events in real time.

Why better AI doesn't solve the alt-data problem

The default reaction from hedge-fund buyers I have spoken to in the last month is "better AI." Build a smarter in-house assistant. Ingest more transcripts. Pipe in faster filings. Make the existing data layer work harder.

It is the wrong frame, because the constraint is not the AI. It is the corpus.

What in-house AI assistants do is synthesise information you already have access to. Transcripts, filings, brokerage research, internal memos. They make the existing corpus faster and more searchable. They are productivity tools wrapped around a closed library.

Synthetic research does something else. It generates deliberation data on questions where no transcript or filing exists. The expert you wish you could call before the call. The consumer who has not been surveyed because the survey window is 12 weeks too late. The buyer who has not yet adjusted to the new tariff that dropped 48 hours ago.

Ask both systems the same question and the difference becomes obvious. "How will US consumers respond if coffee prices double in the next 90 days?"

The in-house AI assistant returns a literature review. Historical price-elasticity studies, the 1973 oil shock parallel, a McKinsey report from 2019. It will be a good literature review. It will also be exactly the data your competitors already have.

A synthetic population returns something else: a spread of responses across the income distribution, substitution patterns from people who drink pour-over and won't switch versus people who drink free office coffee, resentment language from the price-sensitive segment, brand loyalty bending in the high-income segment. The texture of an actual response, calibrated against an actual population.

It is a different category of data, not a better version of the same one.

How synthetic research delivers real-time alternative data

Freshness is the mechanism. FishDog's model is post-trained every four hours on news, weather, regulatory updates, and current market context. Maximum staleness window: four hours. That is the gap between something happening in the world and the synthetic population being able to respond to it.

Why does that matter in practice?

A war breaks out in a tea-exporting region. Tea importers panic; coffee buyers see the substitution pressure coming. Commodity quants on both sides need a signal fast — how will end-buyers adjust, what will they tolerate on price, how does brand loyalty bend under supply shock?

A survey panel cannot help inside that window. The decisions are being made now. The survey takes six weeks. By the time the data lands, the shock has played out and the alpha is gone.

The synthetic population can. It has been post-trained on the last four hours of the actual news. The personas know the war happened. They know what the coffee aisle looked like yesterday. They have the same context the actual buyer does. Their answers are not predictions about the past. They are deliberations about the present.

Different sector, same shape. A new tariff drops on imported white goods — washing machines, dryers, refrigerators. The price floor moves overnight. Retailers decide whether to absorb, pass through, or thin the assortment. Manufacturers decide which SKUs to defend. Every traditional alt-data signal is backward-looking by definition. The foot-traffic data from last week was generated before the tariff existed.

The synthetic population is inside the new world within four hours of the announcement. You can ask: how does a middle-income two-earner household in suburban Atlanta respond if the basic Whirlpool washer goes from $649 to $810 next week? Does the family delay the purchase, switch to a secondary brand, accept the price, or move down a tier? You get a distribution of answers calibrated against an actual population, while the decision window is still open.

Historical alt-data tells you what already happened. Current synthetic data tells you how the world is about to react to it.

How synthetic research fits next to alt-data and expert networks

The question I get from buyers right after the freshness pitch: where does this slot in?

Not on top of your alt-data subscriptions. Next to them. Alt-data tells you what happened; synthetic research tells you what people will do next about what happened. The two answer different questions and feed the same thesis.

Expert networks change shape too. The standard workflow is to spend $1,500 on a 45-minute call with one named expert and hope the questions you brought were the right ones. Run the same questions through a synthetic population first and you learn which questions actually produce variance, which ones surface the cohort splits and the resistance patterns. The expert call then targets the questions doing the real work, not the ones that sounded clever at brief.

Your in-house AI assistant gets fed, not displaced. The synthetic-research output becomes part of the corpus once you run it. The assistant synthesises it alongside the transcripts and filings already in the library. The two layers compound.

The funds getting the most out of this are running it inside the workflow they already have. They are not building a new workflow around it.

Which hedge fund strategies benefit from synthetic research

The audience cut matters, so let me be direct. The last two weeks of demos taught me something useful: this is style-specific.

This works for you if your strategy depends on modelling shocks, discontinuities, or market-altering events. Event-driven traders. Macro discretionary. Commodity quants with global supply-chain exposure. Anyone whose alpha lives in the first 48 hours after the news breaks.

This does not work for you if your strategy is value-fundamental over multi-quarter horizons, or if you package research ideas for larger funds and your edge is depth rather than speed. The four-hour window is not your wedge. A buyer earlier this week made the cut for me: "not really relevant for the investing style we do. The investing style we do is not event-driven real-time stuff." He was right. We are not for every desk.

If your strategy lives in the gap between something happening and the rest of the market figuring out what it means, read on. If it lives somewhere else, we have other pieces that may fit better.

The next 18 months in alternative data

The quant from London was right. Historical signals are table stakes now. Every serious fund has them. The marginal advantage they used to provide has been arbitraged into the floor of the market.

The next layer up is not faster alt-data or cleaner alt-data. It is data on the questions historical signals cannot answer at all.

My read on the next 18 months in the alt-data category: vendors who built their business on better-calibrated versions of the historical signal will keep competing with each other on a flattening curve. The marginal return on the next freshness tweak or the next geographic expansion is shrinking, and the buyers know it. That is why the conversation at London turned so quickly to "what else?"

The vendors who matter in the next cycle will be the ones who answer the questions the historical signal cannot reach. Not because they run faster. Because they are looking at a different problem — what the world is about to do tomorrow about what happened in the last four hours, before the panel can be recruited and the survey can be fielded.

We are building that layer.

If you are at a fund whose strategy depends on getting to the new world before the survey could be run, Thesis Lab is built for you. Long enough to run the questions that matter, short enough to know whether the data class actually shifts your hit rate.

Frequently Asked Questions

What is synthetic research and how does it differ from alternative data?

Synthetic research generates deliberation data on questions where no transcript or filing exists — for example, how a specific consumer cohort will respond to a tariff that dropped 48 hours ago. Historical alternative data tells you what already happened; synthetic research tells you how the world is about to react. The two are complementary inputs, not replacements.

How fresh is FishDog's synthetic research data?

FishDog's model is post-trained every four hours on news, weather, regulatory updates, and current market context. The maximum staleness window is four hours from a real-world event to the synthetic population being able to respond to it.

Which hedge fund strategies benefit from real-time synthetic research?

Event-driven traders, macro discretionary funds, and commodity quants with global supply-chain exposure benefit most — any strategy whose alpha lives in the first 48 hours after a market-altering event. Value-fundamental strategies and idea-packaging-for-larger-funds workflows where depth beats speed are not the wedge.

Doesn't an in-house AI assistant solve the same problem?

No. The constraint is not the AI model — it is the corpus. In-house assistants synthesise information your firm already has access to: transcripts, filings, brokerage research. Synthetic research generates a different category of data on questions where no transcript or filing exists yet. The two layers compound rather than compete.

How does synthetic research integrate with existing fund workflows?

It sits next to alt-data subscriptions (answering different questions in the same thesis), upstream of expert networks (testing which questions actually produce variance before booking a $1,500 expert call), and as input to in-house AI assistants (the synthetic-research output becomes part of the corpus the assistant can synthesise).

Related Articles

Ready to Experience Synthetic Persona Intelligence?

See how population-true synthetic personas can transform your market research and strategic decision-making.

Book a Demo