Free audit. See what your store is sitting on. Get yours

Stop Testing Hunches. The Hypothesis Format That Actually Works

Most A/B tests are guesses in a fancy outfit. A real hypothesis names the change, the segment, the metric, the lift, and the evidence. Here's the format.

Most A/B Tests Are Guesses

You run an A/B test and hope something sticks. You change the button color from blue to green. You shorten the hero headline. You add social proof to the product page. You shuffle the checkout fields. Then you wait two weeks for results.

When the test ends, you look at the uplift. If the variant wins by 3%, you ship it. If it loses, you try something else next week. You never actually know why it worked or why it didn't. You're not testing, you're throwing darts at a board with your eyes closed.

The problem is simple. You don't have a hypothesis. You have a hunch. And hunches collapse because they skip the one thing that separates a real test from a guess: evidence.

A hypothesis is a hunch with proof. It names the change, the segment, the metric, the expected lift, and the evidence that change will work.

The Hypothesis Format

Here's the structure. Write it down before you code:

We believe changing [X] for [user segment] will lift [metric] by [N%] because [evidence].

That's it. Five pieces. Each one forces specificity. Without specificity, you're guessing.

Part 1: The Change (Make It Specific)

Don't say "improve the homepage." That's not a change, it's a direction. A real change is a single, isolated alteration you can ship and measure.

Good change: "Move the 'Free Shipping Over $50' callout from the footer to above the fold on the homepage hero." This is isolated. You can code it. You can measure its impact.

Bad change: "Redesign the homepage." That's 20 changes at once. You won't know which one moved the needle. If it loses, you'll wonder whether your design direction was wrong or just the font weight.

Specific changes are testable. Multi-part changes are noise.

Part 2: The Segment (Not "Everyone")

Most tests say "for all visitors." That's lazy. Different segments have different problems.

A repeat customer who's been buying for six months doesn't need the same friction removal as a first-time visitor. A mobile user won't benefit from the same layout change as a desktop user. A cart abandoner is a different problem than a browsers who never add to cart.

Segment by intent: first-time vs. repeat. Segment by device: mobile vs. desktop. Segment by behavior: viewed 5+ products vs. viewed 1. Segment by traffic source: paid ads vs. organic.

The narrower the segment, the bigger the lift you can measure. If you're testing for "everyone" and the change wins by 1%, you can't trust it. If you're testing for mobile first-time visitors and the change wins by 8%, that's signal.

Part 3: The Metric (One, Not Three)

Pick the metric that matters. Usually that's conversion rate or AOV or both.

Don't say "we'll measure engagement and bounce rate and scroll depth and session duration." You'll find something moved in one direction or another and convince yourself it's a win. Pick one primary metric. If a secondary metric gets worse, note it and move on.

Conversion rate is usually the right call. It's the most aligned with revenue.

Part 4: The Lift (Be Realistic)

Most winning tests lift 5-15%. Not 40%. Not 100%. Five to fifteen.

If you're expecting a 40% lift from changing your CTA button copy, you're hallucinating. A 40% lift happens when you unlock a completely new traffic source or fix a showstopper bug that's killing half your conversions.

Set your bar at 5% minimum. Anything smaller is noise. Expect 10% and you'll be realistic. Hope for 15%.

Build a real test backlog.

Get a hypothesis-first framework for finding your highest-impact tests.

Get Your Free Audit

Part 5: The Evidence (This Is Where It Breaks Down)

This is the section that separates real testing from guessing. Where's your proof?

Evidence comes from four places. Voice of Customer data. Session replay patterns. Funnel data. Competitor teardowns.

Voice of Customer: You asked 30 visitors why they didn't buy. 12 of them said "shipping cost surprised me at checkout." That's evidence. You should test removing the shipping cost surprise earlier in the funnel.

Session replays: You watched 20 recordings of visitors on your product page. 18 of them scrolled to the reviews section immediately. Three customers left before finishing the reviews because there were only two ratings. That's evidence. Test adding product reviews above the fold.

Funnel data: Your Google Analytics shows 60% of visitors hit your homepage but only 2% hit the product page. There's a 58-point drop at category pages. That's evidence that navigation or category discovery is broken. Test improving the category page layout.

Competitor teardowns: Your top competitor has a "free shipping" badge above the product image. Your site doesn't. That's weak evidence alone. But combined with VoC data saying shipping cost is a concern, that's strong evidence to test it.

Without evidence, you're guessing. With evidence, you're testing.

Sample Size and Duration

Run your test long enough. Two weeks minimum. Longer if your traffic is low.

You need at least 1,000 visitors per variant. If you're testing on 100 visitors per variant, your results are noise. Your tool (VWO, Optimizely, Convert, Shopify's built-in) will show a winner, but it's likely false.

Run the test for at least two weeks. One week catches the Monday/Friday bias. Two weeks smooths daily fluctuations. Run for a month if you can afford to wait.

Tools: Pick One and Ship

VWO, Optimizely, and Convert are built for this. They let you write a hypothesis, set sample size, lock in your expected lift, and run tests for the duration. Shopify's built-in A/B testing works too if you're on Shopify.

The tool doesn't matter. The hypothesis does. If you walk into VWO with a guess, it's still a guess.

Free Funnel Audit

Build Your First Hypothesis-First Test

Get a free funnel audit. I'll map your leakiest steps and build five hypothesis-first tests sized for your traffic.

Get Free Audit