Amendment: this post has been edited for typos: Thanks Carlo!

Unless you’re a mathematically-minded person, it’s easy to get muddled up when people talk about conversion rates and conversion rate “uplift”. Here’s a quick explanation:

- If you had 1000 users visit your site yesterday, and 1 of them made a purchase, you’d have a conversion rate of 1:1000 (also referred to as “0.1%” or “0.001”).
- If today you had 1000 visitors and 3 of them made a purchase, what’s the uplift? It’s 200%. In other words, the uplift is a percentage of the increase, not the increase itself (which is 2:1000 people more).

200% sounds HUGE, doesn’t it. Companies like to talk about it that way for two reasons. Firstly, if the underlying numbers are poor (as in this example), it makes them sound impressive. It doesn’t sound quite so good to say “1:1000 more users made a purchase today”. Secondly, they might simply want to hide the details from their competitors. They could have great figures, but don’t want their competitors to know exactly how they are doing. This is understandable – it’s often very sensitive information.

Here are some questions to consider when you’re shown percentage uplift figures within your own organization:

**Establish whether AB / Multivariate testing was carried out.**Some people talk about uplift when comparing two time periods, e.g. last month vs. this month. The problem with that is time-based variation. For example, you will sell more ice-cream in the summer months, regardless of your sales pitch.**Establish the period of the test.**AB / Multivariate testing randomises the samples on the fly, so it removes the effect of “before and after” time-based variations. However, the test itself is still time-bound: if your test was run on christmas day alone, then you should have doubts about whether the findings will generalize to every day of the coming year. Why? Because you will be getting a different sample of users who will be behaving differently to normal.**Get access to the real numbers, not just the uplift percentages.**Ideally, take a look at the reports produced by the testing platform itself. For example, GWO produces unintimidating reports.**Find out what the statistical significance is.**If you toss a coin twice today and then do it again tomorrow, you might get a 200% uplift in incidence of heads. On the face of it, you might be tempted to generalize that your technique is much better today. Of course, that’s plain wrong – you just need a much larger sample size. (Check out User Effect’s split test calculator)**Find out how the conversion goals are defined.**For example, is the goal “the number of people who enter the checkout process” or “the number of people who actually complete a purchase”? There’s a big difference.**Find out what the research objectives were.**Why did they run the study? What were the hypotheses? Were the alternative designs based upon any prior research or theory? Have the findings confirmed expectations or confused them? Answers to questions like this are solid gold – they will extend your organization’s understanding of the relationship between design, psychology and user behaviour within the unique context of your own offering.

Lets look at an example: 37signals reportedly achieved a 200% uplift in an A/B test on Highrise registrations when they tested a call-to-action that read “Sign up for a Free Trial” against an alternative wording: “See Plans and Pricing”. (Read more on Ryan Carson’s blog.) Because it’s 37signals, we can assume they’ve got ample traffic and good conversion rates to start off with, which implies that their uplift is probably very impressive. But hang on a minute – if we’re put our scientist hats back on, we’ll remember that this is only an assumption. We don’t know the answers to many of the questions above. We haven’t been given enough information to evaluate the meaning of the percentage given. It’s intriguing enough for a press release, but it wont wash when communicating within your own organization.

Oh, and by the way, just because they got a 200% uplift for that copy change, it doesn’t mean you will. You can’t reliably generalize AB test results between sites – you have different users, in different states of mind, looking at different page designs. It might even have the opposite effect in your context, so it’s worth doing some user testing of your own. Don’t forget to report it properly, though!

I forgot to mention – many thanks to James Page of FeraLabs and Andy Baker of Ixxy for reviewing the draft of this post!

I’m reading Bad Science by Ben Goldacre, in which the author demonstrates how the media like to manufacture health scares by using misleading stats. It’s frighteningly easy to manipulate numbers to give the desired impression.

And you’re right – it’s always worth looking behind the headline to see what the numbers really mean.

I think I might buy myself a copy of that, thanks! User Experience design – and web design in general these days – sits at the intersection of science and “craftsmanship”. Even if you’re trained in both, it’s easy to forget to wear both hats simultaneously.

Isn’t the increase = 100% instead of 200%?

Considering that you compare

Day 1: number of users = 1

and

Day 2: number of users = 2

The difference is = 1

and 1 is exactly the total number of user of day 1, i.e. 100% of day 1

On day 2 we have double the users buying, but the increase itself compared to day 1 is 100%.

Or am I suddenly dumb? :)

rgrds

Carlo

Carlo, you’re absolutely right – there was a typo in the example. I’ve edited the post!

Pingback: Kilka ciekawych artykuÅ‚Ã³w marketingu internetowego

Pingback: F**K CAPTCHA

Youâ€™d have a conversion rate of 1/1000 but a conversion ratio of 1:999. For 1 conversion, there were 999 non-converts.

From ratio to %: 1:999 = 1/(999+1) = 1/1000 = 0.1%

Otherwise you’d get 1:1000 = 1/1001 = 0.0999001%

It doesn’t make a big difference here but consider 200:800 = 1:4 = 20% vs 200:1000 = 1:5 = 16.7%,

or 500:500 = 50% vs 500:1000 = 33%. That’s enough to perceive a false statistical significance if one of the before-and-after numbers isn’t calculated correctly.

Just nitpicking; very solid article.

Pingback: Boot up: Google hits florist, HTC’s FTC charge, Samsung takes on disability … | Old Click

Pingback: Our Bookmarks: Jul 20 - Jul 26, 2009