A systematic approach to measure how coupons and promotions impact your Average Order Value
All users who have ordered at least once in the past 3 months.
Dormant users who haven't ordered in the last 30 days.
High-value users only (top 10% spenders).
Proper segmentation ensures your test measures lift in the most relevant population.
Ensures enough users to detect a given lift with statistical significance.
Randomly assign users to Control and Experiment groups.
Why: Balances user characteristics; works well with large samples.
Divide users by baseline AOV range and assign randomly within each bucket.
Why: Ensures similar baseline metrics.
Pair similar users and randomly assign one to each group.
Why: Most precise for small user pools.
Both groups should have similar baseline AOV for valid results.
Calculate mean and standard deviation of AOV for both groups.
Steps:
p-value is above threshold (0.05); no significant baseline difference.
Perform a t-test (or alternative) on the baseline AOV data.
Ensures both groups are comparable for valid testing.
Re-run assignments or use stratification.
No new coupon; standard experience.
Receives the new promotional offer.
Perform a t-test (or alternative) to confirm the difference in AOVs.
Consider:
Mathematically,
For example, if Control AOV = $50 and Experiment AOV = $60, then:
All User Segments
(Dormant, high-value, etc.)
1,600 users
To detect a 10% lift
Random, Stratified, or Matched
For balanced groups
+10.00% Lift
Statistically significant
Proper group selection is essential.
Similar baselines validate accuracy.
Statistical significance confirms non-random lift.
Meaningful lift guides final decisions.