← Back to Articles

How to Test Product Concepts with Claude Code and Ditto

Concept Testing HT Illustration

Seventy-two per cent of product features ship without any form of customer validation. The number comes from Pendo's 2023 State of Product report, and it has not improved since.

The Most Expensive Mistake in Product Development

The statistic bears repeating, because the implications are genuinely staggering. Nearly three quarters of the features that engineering teams spend weeks or months building have never been tested against the people who will supposedly use them. Not tested in a rigorous way. Not tested in a casual way. Not tested at all. The feature was conceived in a planning meeting, debated in a prioritisation session, specced in a document, built across several sprints, shipped to production, and then -- only then -- subjected to its first real experiment: whether customers actually want it.

The cost of this pattern is not merely the engineering hours wasted on features that nobody adopts. It is the opportunity cost of features that were never built because the roadmap was occupied by untested assumptions. It is the support cost of explaining features that customers misunderstand. It is the marketing cost of positioning a product around capabilities that do not resonate. And it is the strategic cost of building confidence in a roadmap that is, statistically, wrong about seven out of every ten items on it.

Traditional concept testing exists, of course. Focus groups cost $15,000 to $50,000 per round, require four to eight weeks of logistics, and produce findings that are heavily influenced by the moderator's framing and the loudest voice in the room. Surveys are cheaper but slower, and they suffer from stated-preference bias: people describe what they think they would do, which is a notoriously poor predictor of what they actually do. Beta testing is the most common alternative, but it arrives too late. By the time you have a working beta, you have already committed engineering resources to a particular implementation. The sunk cost fallacy ensures that unfavourable beta feedback gets rationalised rather than acted upon.

The gap in the market is not for more research. It is for research that arrives early enough to change decisions and fast enough to keep pace with the development cycle. That is the gap that synthetic consumer panels fill. Ditto recruits ten AI-generated personas matched to your target customer profile, asks them seven carefully designed questions about your product concept, and returns prioritised insights in under an hour. The concept has not been built. No engineering time has been committed. The only investment is the time it took to describe the idea clearly enough to test it.

This article explains how to run a product concept and feature testing study using Claude Code and Ditto: the questions to ask, the deliverables to extract, and the decisions the output should inform. It complements the product launch article, which covers validation of the complete product. This article focuses upstream, on the individual features and concepts that should be tested before they reach the roadmap.

Why Concept Testing Is the Highest-ROI Research You Can Do

The economics of concept testing are asymmetric in a way that most product teams underappreciate. The cost of testing a concept before building it is small: a few hours of time, a modest research budget, and the willingness to accept that you might be wrong. The cost of not testing is potentially enormous: weeks of engineering time, a launch that underperforms, a positioning narrative that does not land, and the demoralising experience of watching adoption metrics flatline for a feature that everyone internally was excited about.

Consider the decision tree. A product team has four possible outcomes:

  1. Test and build. The concept tests well. The team builds it with confidence, armed with customer language for positioning, clear success criteria for measurement, and an understanding of which aspects matter most. This is the best outcome.

  1. Test and skip. The concept tests poorly. The team redirects resources to something more promising. This feels like a loss but is actually the second-best outcome, because it prevents a much larger loss downstream.

  1. Skip testing and build successfully. The team gets lucky. The feature resonates without validation. This happens, but it happens less often than most product leaders believe, and each success reinforces the dangerous habit of shipping without evidence.

  1. Skip testing and build unsuccessfully. The team ships a feature that fails to gain adoption. This is the most common outcome, and it is the most expensive, because the cost is not just the wasted engineering but the eroded trust in the product team's judgement.

The maths is straightforward. If concept testing costs $500 and an hour of time, and an average feature costs $50,000 to $200,000 in engineering resources, the testing needs to prevent only one bad feature per year to deliver a hundredfold return. In practice, it prevents considerably more than one.

The reason most teams skip this step is not economic. It is logistical. Traditional concept testing takes too long. By the time the focus group report arrives, the planning cycle has moved on. The feature has already been committed to in a quarterly roadmap review. The research becomes a post-hoc rationalisation exercise rather than a genuine decision input. Synthetic panels eliminate the logistical barrier. When a study takes forty-five minutes instead of six weeks, it becomes possible to test every significant feature before committing to build it.

The Seven Questions That Validate a Product Concept

The Ditto concept testing study uses seven questions, each designed to validate a specific dimension of the feature concept. The sequence matters. It moves from open comprehension (does the customer even understand what this is?) through prioritisation and tradeoffs to the ultimate test: would this change their behaviour?

Question 1: Comprehension and Use Case

"We're considering adding [Feature A] to our product. When you hear that, what do you imagine? How would you use it?"

This is the most important question in the study, and it must come first. If customers cannot articulate what the feature does and how they would use it after hearing a brief description, no amount of engineering excellence will rescue it. The comprehension question also reveals something subtler: the mental model that customers apply to the concept. They may understand it perfectly but imagine a use case that differs entirely from what the product team intended. That divergence is itself a finding, and often the most valuable one.

Question 2: Feature Prioritisation

"Rate these potential features from most to least valuable for your work: [Feature A], [Feature B], [Feature C], [Feature D]. Explain your ranking."

Asking about a single feature in isolation produces inflated importance scores. Everything sounds useful when it is the only thing on the table. Forcing a ranking against three or four alternatives introduces the constraint that actually governs product decisions: we cannot build everything, so what matters most? The explanations behind the ranking are often more valuable than the ranking itself, because they reveal the criteria that customers use to evaluate features, which is precisely the language your positioning should adopt.

Question 3: Time-to-Value and Trust

"You currently use [existing workaround] to do [task]. If [Feature A] could do this automatically, how much time would that save you? Would you trust it?"

This question anchors the feature against the customer's current reality. It forces a comparison with the workaround they already use, which accomplishes two things. First, it quantifies the potential value in terms the customer can estimate (time saved). Second, it surfaces the trust dimension that is often invisible in product planning. Many features offer theoretical efficiency gains that customers will not adopt because they do not trust the output. Automated report generation, for instance, may save hours, but if the user reviews every generated report line by line, the net time saving is negative.

Question 4: White Space Discovery

"What's one feature you've always wished [product category] tools would have? Something nobody offers but you'd love?"

This question is deliberately open-ended and deliberately positioned in the middle of the study, after the participant has been thinking about features and their own workflows. It captures the ideas that customers have been carrying around but that no vendor has asked about. These responses frequently identify opportunities that were not on the roadmap at all, which is precisely why they are valuable. The product team cannot prioritise what it has not considered.

Question 5: Tradeoff Tolerance

"If [Feature A] worked perfectly but meant [tradeoff, e.g., slower speed, higher price, more complexity], would that be worth it?"

Every feature involves tradeoffs, and customers are generally more realistic about this than product teams give them credit for. The tradeoff question reveals the boundaries of acceptability. A customer who says "I would accept slower speed for better accuracy" is telling you something essential about which quality dimension to optimise. A customer who says "Absolutely not, speed is everything" is telling you something equally essential. The aggregate response pattern creates a tradeoff map that directly informs engineering priorities.

Question 6: Success Criteria Definition

"Imagine you just used [Feature A] for the first time. What would success look like? What result would make you say 'wow, this was worth it'?"

Product teams define success in terms of usage metrics: daily active users, feature adoption rate, time in feature. Customers define success in terms of outcomes: the report was accurate, the recommendation was useful, the process was faster than before. This question captures the customer's definition, which should inform both the quality assurance criteria (how do we know the feature is working?) and the customer success narrative (what should we help users achieve?). The gap between internal and external definitions of success is almost always larger than expected.

Question 7: Adoption and Advocacy Impact

"Would [Feature A] change how often you use the product? Would you recommend it to others because of this feature specifically?"

The final question tests the two hardest outcomes to achieve: increased engagement and word-of-mouth advocacy. A feature that customers value but would not recommend is a hygiene feature: necessary but not differentiating. A feature that customers would specifically recommend is a growth feature: it creates organic acquisition. The distinction matters enormously for marketing investment. Growth features deserve launch campaigns, demo videos, and prominent positioning. Hygiene features deserve release notes and documentation updates.

The Six Deliverables: What Claude Code Produces

The raw responses from ten personas across seven questions amount to roughly seventy individual data points. Useful, but unwieldy. Claude Code synthesises these into six structured deliverables that map directly to product team decisions.

1. Feature Priority Scorecard

A ranked list of the features tested, with each ranking supported by aggregated evidence from the persona responses. The scorecard includes a demand score (how many personas ranked this feature first or second), a clarity score (how well personas understood the concept), and a confidence score (how consistent the ranking was across the panel). Features with high demand but low clarity need better positioning before they need engineering. Features with high clarity but low demand need to be reconsidered entirely.

2. Build vs. Skip Recommendation

A binary recommendation for each feature, supported by the evidence. This is the deliverable that product teams find most directly useful, because it translates nuanced research into the language of roadmap planning. The recommendation is not a decree. It is a data-backed argument. A "skip" recommendation does not mean the feature is bad. It means the evidence does not support building it now, and the report explains why.

3. Feature Positioning Guide

For each feature that receives a "build" recommendation, the study produces a positioning guide based on the language that resonated with personas. This includes the primary value proposition (the single sentence that explains why this feature matters), the supporting evidence (the use cases and outcomes that personas described), and the objection responses (the concerns that emerged and how they can be addressed). The positioning guide is written in customer language, not product language, because the entire point is that the customer's frame of reference should drive the narrative.

4. Success Metrics

Customer-defined criteria for what "working well" looks like for each feature. These are distinct from the engagement metrics that product analytics will track. They are outcome metrics: the results that customers expect and against which they will judge the feature. Product teams that define success only in terms of adoption rates miss the point. A feature can be widely adopted and still fail if it does not deliver the outcome customers expected.

5. Tradeoff Analysis

A map of acceptable and unacceptable tradeoffs for each feature, derived from Question 5. This deliverable is particularly useful for engineering leads, who must make implementation decisions that involve tradeoffs between speed, accuracy, complexity, and cost. The tradeoff analysis converts a subjective judgement ("we think users prefer speed over accuracy") into an evidence-based position ("seven of ten personas explicitly preferred accuracy over speed").

6. White Space Opportunities

A catalogue of features and capabilities that personas requested but that are not currently on the roadmap. These are organised by frequency (how many personas mentioned similar ideas), feasibility (a rough assessment based on the product's current architecture), and strategic alignment (how well the idea fits the product's positioning). White space opportunities are the most speculative deliverable, but they are also the most strategically interesting, because they represent unmet demand that competitors have also failed to address.

The Claude Code Workflow: A Practical Walkthrough

The operational sequence is straightforward. Claude Code handles the API orchestration, the polling, and the synthesis. The product marketer's job is to define the concept clearly and to review the output critically.

Step 1: Define the concept.

Write a clear, jargon-free description of the feature or product concept you want to test. Include what it does, who it is for, and what problem it solves. This description becomes the context that Ditto's personas receive before answering questions. Ambiguous descriptions produce ambiguous responses. Invest the time to be precise.

Step 2: Recruit the research group.

Claude Code calls the Ditto API to recruit a panel of ten personas matched to your target customer profile. The recruitment filters include country, age range, professional context, and category familiarity. For a B2B feature, you would recruit personas with relevant job titles and industry experience. For a consumer feature, you would recruit by demographics and product category usage.

``` Recruit 10 personas: product managers, 28-45, US and UK, familiar with project management tools ```

The recruitment takes approximately two minutes. Ditto generates ten synthetic personas with names, backgrounds, occupations, and behavioural profiles. These are not random characters. They are constructed to represent the diversity of your target segment: different experience levels, different company sizes, different priorities.

Step 3: Create the study and ask seven questions.

Claude Code creates a study with the concept description as context, then submits the seven questions sequentially. Each question receives individual responses from all ten personas. The study takes approximately thirty to forty minutes to complete, during which Claude Code polls the API at intervals to check for completion.

Step 4: Complete the study and extract insights.

Once all responses are in, Claude Code triggers Ditto's completion analysis, which produces an AI-generated summary of themes, consensus points, and divergences across the panel. Claude Code then synthesises the raw responses and the completion summary into the six deliverables described above.

Step 5: Generate the share link.

Ditto produces a shareable link to the live study, which can be included in stakeholder presentations, roadmap reviews, or product briefs. The share link allows anyone to explore the individual persona responses, which is considerably more persuasive than a summary document. Seeing ten distinct individuals explain why they would or would not use a feature has a rhetorical force that aggregated data lacks.

The entire workflow, from concept definition to completed deliverables, takes approximately sixty to ninety minutes. Traditional concept testing takes four to eight weeks. The speed difference is not incremental. It is categorical. It means concept testing can happen for every significant feature, not just the ones that are important enough to justify a six-figure research budget.

Reading the Results: What Good Looks Like

Raw output requires interpretation. Here is how to read the six deliverables and translate them into product decisions.

Strong build signal. Seven or more personas rank the feature in their top two. Comprehension is high (personas accurately describe the use case without prompting). The tradeoff tolerance is broad (customers accept reasonable compromises). Success criteria are specific and measurable. This feature should be built, and the positioning guide provides the language for the launch.

Weak build signal with repositioning potential. Five or six personas rank the feature highly, but comprehension is mixed. Some personas understand the concept correctly; others describe a different use case entirely. This pattern suggests the feature itself may be sound, but the way it is being described needs work. The fix is not engineering. It is messaging. Rewrite the concept description using the language that resonated and test again.

Skip signal. Fewer than four personas rank the feature in their top two. Comprehension is low or inconsistent. Tradeoff tolerance is narrow (customers will not accept any compromise for this feature). Advocacy impact is negligible. This feature should be deprioritised. The skip recommendation should include the specific reasons, because product teams understandably resist removing features from a roadmap, and data-backed reasoning is considerably more persuasive than opinion.

White space signal. Three or more personas independently describe a similar unmet need in Question 4. This is the strongest possible indicator of latent demand, because it emerged without any prompting. White space signals should be treated as new concept candidates and fed back into the testing cycle.

The discipline of concept testing is not to build what tests well and skip what tests poorly. It is to build what tests well, investigate what tests ambiguously, skip what tests poorly, and explore what emerges unexpectedly. The seven-question framework captures all four signals in a single study.

When Not to Use Synthetic Concept Testing

Intellectual honesty requires acknowledging the boundaries. Synthetic concept testing works well for most product features and concepts, but there are categories where it is insufficient or inappropriate.

Genuinely novel categories with no reference frame. If the concept is so new that no existing mental model applies, synthetic personas will struggle in the same way that real consumers would struggle in a survey. The first iPhone would have tested poorly against a feature phone panel, because the panel had no framework for evaluating a device that combined a phone, an iPod, and an internet browser. If your concept requires the customer to imagine a world that does not yet exist, synthetic testing will underestimate its potential. In these cases, prototype testing with real users is essential.

Deep emotional or sensory products. Taste, texture, scent, physical comfort, and aesthetic beauty are dimensions that synthetic personas cannot evaluate from a text description. A new fragrance, a novel food product, or a piece of furniture that must be sat in to be judged will not produce reliable results from any form of concept testing that relies on description rather than experience. The same applies to products where the emotional response is the entire value proposition: luxury goods, art, entertainment.

Regulatory or compliance-driven features. Features that exist because of legal requirements rather than customer demand will test poorly regardless of methodology. Customers do not value compliance features, because the value accrues to the company rather than the user. Testing these features against a consumer panel will produce a predictable "skip" recommendation that would be wrong to follow.

Price-sensitive decisions at extreme price points. The pricing article in this series covers pricing research in depth. For concept testing specifically, be cautious about features where the pricing dimension dominates the value assessment. A feature that costs $500 per month will produce different reactions depending on whether the respondent is at a startup or an enterprise, and synthetic personas may not fully replicate the organisational politics that govern purchasing decisions at high price points.

In all of these cases, synthetic concept testing is not useless. It can still surface comprehension issues, prioritisation signals, and language patterns. But it should be supplemented with real-world testing methods, not treated as sufficient on its own.

The Compound Effect: Testing as a Product Culture

The most significant benefit of fast, inexpensive concept testing is not any individual study. It is the cultural shift that occurs when testing becomes the default rather than the exception.

Product teams that test concepts regularly develop three habits that compound over time. First, they become better at describing concepts clearly, because the act of writing a description for a research panel forces a precision that internal discussions do not require. Second, they become more comfortable with negative results, because frequent testing normalises the experience of learning that an idea does not resonate. Third, they build an evidence base that makes roadmap debates less political and more empirical. When every feature on the roadmap has been tested, prioritisation conversations shift from "I believe this is important" to "the evidence suggests this is important."

The seven-question framework described in this article is not the only way to test a product concept. It is a starting point. Teams that adopt synthetic testing tend to evolve their question sets over time, adding questions specific to their product category, removing questions that consistently produce low-value responses, and experimenting with different panel compositions. The framework is designed to be adapted, not followed rigidly.

The underlying principle is simpler than any specific methodology. Every significant feature should be tested before it is built. The test should be fast enough that it does not delay the development cycle. The results should be structured enough that they inform specific decisions. And the process should be cheap enough that no one needs to justify the expense.

For seventy-two per cent of features, none of this happens. That is the opportunity.

The Claude Code and Ditto for Product Marketing Series

This is part of a series exploring how AI agents handle the core disciplines of product marketing. Each article covers one function of the PMM stack, explains the methodology, and links to a companion Claude Code guide you can run yourself.

Frequently Asked Questions

What is synthetic product concept testing?

Synthetic product concept testing uses AI-generated personas matched to your target customer profile to evaluate feature ideas before they are built. Platforms like Ditto recruit 10 synthetic respondents who answer 7 structured questions about comprehension, prioritisation, tradeoffs, and adoption likelihood, delivering results in under an hour versus 4-8 weeks for traditional focus groups.

How much does product concept testing cost?

Traditional concept testing via focus groups costs $15,000-$50,000 per round. Synthetic concept testing through Ditto and Claude Code costs a fraction of that and completes in under an hour. Since an average feature costs $50,000-$200,000 in engineering resources, preventing even one poorly-validated feature per year delivers outsized returns on the research investment.

What questions should a concept test ask?

An effective concept test uses 7 questions: comprehension and use case (do customers understand it?), feature prioritisation (how does it rank against alternatives?), time-to-value and trust (would they trust automated output?), white space discovery (what unmet needs exist?), tradeoff tolerance (what compromises are acceptable?), success criteria (what outcome would impress them?), and adoption/advocacy impact (would they recommend it?).

When should you test a product concept?

Product concepts should be tested before any engineering resources are committed. The ideal timing is during the planning or prioritisation phase, when the cost of testing is minimal and the ability to redirect resources is maximal. Testing after a beta is built introduces sunk cost bias that causes teams to rationalise unfavourable feedback rather than act on it.

What are the limitations of synthetic concept testing?

Synthetic concept testing is less reliable for genuinely novel categories with no existing reference frame, deep emotional or sensory products (taste, texture, scent), regulatory compliance features where value accrues to the company not the user, and decisions dominated by extreme price points where organisational politics govern purchasing. In these cases, it should supplement rather than replace real-world testing.

Related Articles

Ready to Experience Synthetic Persona Intelligence?

See how population-true synthetic personas can transform your market research and strategic decision-making.

Book a Demo