AC18: Experiment Design & Analysis

The purpose of this activity is to learn about designing and analyzing controlled experiments on user interfaces. You'll need a web browser for this activity. The workbench for the activity is located at http://courses.csail.mit.edu/6.831/handouts/ac18/experiment.html.

1 Warmup

The workbench is initially configured for a simple measurement: how long it takes the user to press A or B when prompted. Each trial will randomly display either A or B in the middle of the window, and the user should press the same letter on the keyboard as quickly as possible. The warmup is configured to run 10 trials each time, and statistics are displayed on the right.

Run the measurement on one of your group members, and record the results (count, mean, stddev, and stderr) below. (You can edit the results textbox manually, so if you want to exclude wrong answers or extreme outlier measurements, you can delete them or comment them out and the statistics will update.) Mention whether you included wrong answers and outliers in your results below.

2 Hypothesis

Suppose you're designing an email client, and you have the following design question: how should unread messages be distinguished from already-read messages in the inbox? You have two design proposals: (1) use bold font for unread messages, plain font for the others; or (2) use a serif font for unread messages, and a sans-serif font for the others. You're willing to ignore external consistency for now; your question is really, if we were going to redesign email clients from scratch, which approach would be better?

Formulate a testable hypothesis that will help you decide which design proposal is better, and write your hypothesis below.

3 Experiment Design

Design an experiment to test your hypothesis using the workbench. Some issues to think about:

External validity: You want your results to be relevant to an email client inbox. What should each trial look like? What should the user's task be in each trial?
Internal validity: What are the independent variable(s) you'll be comparing? What dependent variable(s) will you be measuring? What variables will you be controlling, and what will you be randomizing? How will you avoid selection and learning effects?
Reliability: How many trials will each user do? How many users will you need? Is it between-subjects or within-subjects?

Describe your experiment below, in enough detail so that you can actually run it. (For example, the warmup experiment had 10 trials, each randomly displaying either A or B, and measured the time to press the letter.) Don't agonize over each choice; just pick something reasonable that will allow you to run the experiment right now, in class.)

4 Experiment Analysis

Implement your experiment design in the workbench. You may have to change nextTask() and handleAnswer() substantially. Run your experiment and record its statistics below (count, mean, stddev, stderr for each condition). Looking at the standard error, does the data support your hypothesis?

5 Reflection

What threats to internal validity and external validity are present in your experiment design? What new threats did you discover in the course of running the experiment?

6.831 • User Interface Design and Implementation

Links