[Talk Summary 12] A/B Testing at Scale

December 09, 2016

Dr. Pavel Dmitriev, a Principal Data Scientist, from Microsoft's Analysis and Experimentation team had a talk about "A/B Testing at Scale" on Thursday, 2016/12/08. The talk was about the introduction of a controlled experiment, four real experiments that Microsoft had been running, and 5 challenges about testing at scale.

Dr. Pavel started the talk with a brief introduction of controlled experiments, aka A/B tests. A/B testing is a method of comparing two versions of a webpage or app against each other to determine which one performs better. A/B testing is also used to evaluate a new feature of an application. If the feature has an effect on users, the result will show the significant difference (p<0.05); the lack of different is called null hypothesis.

With the evolving product development process, Dr. Pavel presented the motivation for A/B testing. In classical software development, a product is usually designed, developed, tested and then released. However, in customer-driven development, the process is from "build" to "measure" to learn (continuous deployment cycles), because we are poor at assessing the value of ideas. There experiment and get the data can help us to evaluate the value of ideas. To demonstrate four real experiments that Microsoft had been running, he showed the experiments and asked the attendances to choose which design between A and B will win. By doing that, he made some statistics on how different between the two groups to show whether the two designs are significant or not.

Finally, Dr. Pavel claimed that while the theory of experimentation is well established, scaling experimentation to millions of users, devices, platforms, websites, apps, social networks, etc. presents new challenges of A/B testing:

Challenge 1: trustworthiness
Challenge 2: protecting the users
Challenge 3: the OEC (Overall Evaluation Criterion)
Challenge 4: violations of classical assumptions of a controlled experiment
Challenge 5: analysis of results

NHST = Null Hypothesis Testing
Heterogeneous treatment effect

12/08/2016
Information Sciences Building, 3rd Floor
University of Pittsburgh

Search This Blog

HUNG CHAU's blog

[Talk Summary 12] A/B Testing at Scale

Comments

Post a Comment

Popular posts from this blog

[Talk Summary 5] Two Case Studies in Semantic Inference

Personalized Access to Scientific Publications: from Recommendation to Explanation

[Talk Summary 4] Entity/Event-level Sentiment Detection and Inference