AB-tests analysis Intro to AB-testing

198. Intro to AB-testing

It’s March 2018. Bindle has built a pretty solid mobile app and now they want to try something new. They want not only to ship new features and improvements to the app but also learn if it made any impact on product and business.

There are several ways how to measure the impact of new app releases. In this chapter will be focusing on the most scientific one of them – AB-testing.

Where it all began

In the very beginning, Bindle was very similar to most companies out there. They had a goal to ship meaningful features fast, get the product out there asap. I’d call this approach “Let’s ship it and hope for the best”.

The company wasn’t really interested in measuring metrics. Because if you haven’t shipped your product – there’s nothing to measure.

Unfortunately, a lot of companies keep staying in this phase for a long time. The only metrics they’re looking at are vanity metrics like the total number of users and tax statements.

Before/after tests

Bindle was lucky and had great founders and advisors onboard. They knew the importance and the value data provides. They invested in having a proper data analytics setup early (web/mobile analytics systems, data warehouse, mobile attribution tracking, etc).

Even having a basic data warehouse with just production database tables will allow you to approach new releases more mindfully. If you measure AARRR metrics and keep an eye on Unit Economics you can track these metrics before and after new releases. Such an approach is called before/after tests.

It’s definitely a huge improvement from the previous approach. Of course, it has its own drawbacks. You can’t really compare retention metrics, because your before cohort has a lot of data (the feature was live for some time) and the after cohort (the new release) will need time to get new users and traction.

Another drawback is that cohorts could have completely different users. Imagine you shipped a new feature and immediately started a brand new marketing campaign with a brand new targeting. The new campaign or the new targeting could actually be the main cause of an increase or a decrease in metrics, this is why one should be very careful about before/after tests.

As you can see before/after testing is not ideal but it’s much better than nothing. It’ll definitely save your product from dramatic drops in metrics which is already a great deal.

AB-tests

How can we address the drawbacks of before/after tests? Would not it be great to have both before and after features at the same time and have some people using the before variation and some the after one?

Such approach is called AB-testing and it’s clear that it requires some technical setup. We need a brand new system that:

allows both before (will be referred to as A) and after version (B) of the same feature co-exist in our app/website
split users between A and B. The different versions of the same feature are usually referred to as variations.
make sure that the split between variation is consistent. If users refresh a webpage or log in to the app again they’ll see the same variation. Otherwise, our AB-test data will be screwed. Shit in – shit out, remember?

Example of an AB-test

The classic example of an AB-test is testing the color of a button. 50% of a website visitors see red button, 50% see a blue one. We measure CTR of the button for both variations and determine a winner – the color of the button with the highest CTR. Once the AB-test is finished we ship the winning color as a new default version for all visitors. PROFIT.

The example with a button color is easy to understand but it’s a bit misleading IMO. We can AB-test anything – how pages look, we can AB-test entire flows in the app. AB-testing allows us to understand our audience better and thus make a better product and build a better business. That should be the goal.

AB-testing button colors rarely brings a significant uplift, IMO it’s an example of low effort/low impact AB-test. We definitely want to ship high impact AB-tests which often requires high effort.

In the upcoming lessons we’ll continue talking about AB-testing and how data and SQL can help us simplify the analysis of AB-tests.

Discuss on Forum

About SQL Habit

Hi, it’s Anatoli, the author of SQL Habit.

SQL Habit is a course (or, as some of the students say, “business simulator”). It’s based on a story of a fictional startup called Bindle. You’ll play a role of their Data Analyst and solve real-life challenges from Business, Marketing, and Product Management.

SQL Habit course is made of bite-sized lessons (you’re looking at one atm) and exercises. They always have a real-life setting and detailed explanations. You can immediately apply everything you’ve learned at work.

Loading chart...

Previous lesson: Product Analytics. Part 2 recap