The expensive part of A/B testing is rarely the tool subscription. The real bill comes from testing things that never deserved a test, while the changes that did deserve one go live blind.

I see both mistakes weekly. A bug sitting in a testing queue for three weeks. A brand-new offer page shipped on a gut feeling, because it "obviously" had to win.

After thousands of tests across 190+ Shopify brands, almost every bad testing decision traces back to skipping one of three questions. The whole filter takes thirty seconds. Here it is.

Table of Contents

Question 1: is something broken?

A bug. A typo. A template that renders wrong on mobile.

Fix it today.

A defect has no hypothesis. There is no version of the story where the broken size chart wins. Testing a defect is paying to learn what you already know.

I still watch teams do this. Someone finds a dead review widget, adds it to the experimentation backlog, and waits weeks for significance on something a developer could have fixed before lunch. That's a queue pretending to be rigor.

The fix still deserves a ticket and a check afterwards. It just doesn't deserve a control group.

Question 2: does it change what people see, believe or decide?

If the answer is no, ship it. A refactor. A faster script. Swapping an app behind the scenes. Invisible to the customer, so no verdict needed. Waiting for significance on something the customer can't perceive is just slow shipping.

If the answer is yes, you've found the category that earns a test. A removed doubt. A new promise. A different way of presenting the price. An offer change.

This is where most brands flip the logic. They test button colors and ship the new offer page untested, because the offer page felt like a guaranteed win.

Even when it feels obvious. Especially when it feels obvious. We've watched clean, minimal redesigns lose to the cluttered pages they replaced. We've watched "stronger" headlines lose to the ones they were meant to retire. Obvious has cost us money more than once.

Your team does not get the final vote. Cold traffic does, and cold traffic votes differently than people who look at the site every day.

Question 3: do 500+ orders a month pass through what you're changing?

This is the exception that keeps the filter honest, because a test is only as good as the data that feeds it.

Below roughly 250 orders per variation, a test almost never reaches a verdict. You'll stare at a dashboard for six weeks, call it at 70% probability, and pretend that's knowledge. That's statistical theater.

So before any test, do the ten-second math. Take the monthly orders that actually pass through the thing you're touching. Changing a section lower on the page? Check the scroll percentage first. If only a third of visitors ever reach that section, only a third of the page's volume counts.

Clear 500 and you test. Land under it and you skip the test, without skipping the discipline:

Ship the change on research. Review mining, session recordings, patterns proven across brands like yours. Then compare revenue per visitor for the week after against the week before. It's a rougher signal than a test, but it's a real one, and it doesn't cost you six weeks of waiting.

Then park the idea. What lacks the volume today can be retested at scale next year. Volume changes. The filter doesn't.

Revenue per visitor is the deciding metric here for the same reason it decides every AOV play: conversion up with order value down is a loss wearing a green number. I broke that down in the average order value article.

The same filter at €1M and at €100M

Run the filter at a €1M brand and most ideas route to fix or ship. Almost no page clears 500 orders a month. That is the answer: you're not a testing operation yet, and pretending to be one burns months you should spend on your offer and your traffic.

Run it at a €100M brand and nearly everything customer-facing clears question 3. So nearly everything customer-facing earns a test, including the obvious wins. At that volume a 1% swing is board-meeting money, and shipping a quiet loser costs more than any test ever will.

Same three questions. Your order volume answers them for you.

FAQ: when to A/B test (and when to skip it)

How much traffic do you need for an A/B test? Think in orders instead of visitors. You want 500+ monthly orders passing through the element you're changing, so each variation collects roughly 250+ orders within a month. The visitor number that requires depends on your conversion rate: at 2%, that's about 25.000 monthly visitors on that page.

What if my whole store does fewer than 500 orders a month? Then testing is not your growth lever yet. Ship research-backed improvements, judge them on revenue per visitor week over week, and put your energy into the offer. The testing phase comes later, and it comes faster this way.

Why revenue per visitor instead of conversion rate? Because the two best levers fight each other. A change can lift conversion while shrinking order value, or the reverse. Revenue per visitor is the one number that catches both.

How long should a test run once it clears the filter? Full weeks, always, so every weekday and weekend is represented. Keep it running until each variation has the volume, and never call it early because the graph looks good on day four. Day-four graphs lie.

Take the card into your next website meeting

Run every idea on the backlog through the three questions. Half the list will die in the first two. What survives is where the money is.

Broken gets fixed today. Invisible gets shipped. Everything that touches what people see, believe or decide gets tested, when the volume is there to give you a verdict.

Reply

Avatar

or to participate

Keep Reading