The Invisible Power of Covariates: How to Overcome Selection Bias in A/B Tests

The Problem Nobody Wants to See

Imagine: A large e-commerce company rolls out a new design banner and measures the average session duration. The initial look at the data promises a lot – an increase of 0.56 minutes (about 33 seconds per session). Sounds promising, right? But here begins the adventure of statistical deep analysis.

The dilemma: How confident can we be that the banner is truly the reason for this improvement? What if older, tech-savvy users systematically see the new banner more often than new customers? The answer leads us to a classic problem in empirical research – selection bias.

T-Test vs. Linear Regression: The Wrong Duel

The classic t-test quickly provides an answer here. The difference between the control and treatment groups is exactly 0.56 minutes – done. But a common mistake: many analysts think linear regression is only relevant for more complex scenarios. That’s false.

What happens if we instead use a linear regression with banner status (1 = visible, 0 = not visible) as an independent variable and session duration as the output? Surprisingly, we get the same treatment coefficient: 0.56 minutes. No coincidence – mathematically, both tests are equivalent under these conditions because they test the same null hypothesis.

However, the R-squared reveals a problem: with only 0.008, we explain less than 1% of the variance. The model ignores many other factors that actually influence how long users stay on the page.

The Game-Changer: Adding Covariates

This is where the true strength of linear regression shows. When we introduce an additional variable – for example, the average session duration of users before the experiment – everything changes dramatically.

The model improves instantly: R-squared jumps to 0.86, explaining 86% of the variance. More importantly: the treatment effect drops to 0.47 minutes. Why? The previous covariate reveals a “snowball effect” – users who already had long sessions tend to show a snowball-like behavior pattern, where small initial differences add up to large effects.

This insight is crucial: the original effect of 0.56 was partly inflated by selection bias. Users with naturally longer sessions were not randomly distributed between groups – they were more concentrated in the treatment group.

The Mathematical Truth: ATE, ATT, and SB

To express this formally:

  • ATE (Average Treatment Effect): The average treatment effect we aim to estimate
  • ATT (Average Treatment on the Treated): The effect on the users who actually received the treatment – also called ACE (Average Causal Effect)
  • SB (Selection Bias): The selection bias that distorts the true effect

The naive difference between group means mixes these quantities:

Naive estimate = ATE + SB

With covariates, we can mitigate the bias and get closer to the true effect.

Validation through Simulation

In a controlled experiment where the true effect is known (0.5 minutes), it shows:

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)