Last Update: 3 Oct 2018

This chapter is incomplete for the moment. Please also refer to the slides that we talk about in the course.

12.1 Definition

A definition of usability from ISO 9241-11:

[...] the extent to which a product can be used by specified users to achieve specific goals with effectiveness, efficiency and satisfaction in a specified context of use.

12.2 Designing a user study

12.2.1 Task and Trial

In a systematic user study you usually define one task or a number of tasks. Each completion of a task is called a trial. A session refers to the time span where a single participant does all trials in, usually between 15 minutes or one hour. The session includes briefing, training and debriefing.

For example, in a typical Fitts' Law study the user is presented with a dot on a screen and has to drag this dot (with his/her finger) to a highlighted target area as quickly as possible. This is the only task but there are many different trials with different positions and sizes of the target area.

Another example is evaluating a new web shop interface. Here, the user can have a series of (possibly consecutive) tasks, e.g. "Find the blue hand bag of brand X and put it into your shopping cart" or "Complete your purchase".

It is important to clearly define the end state of a task, e.g. to decide when to measure the time.

12.2.2 Planning a Session

A session should not take too long to avoid negative effects on motivation and concentration which can distort the resulting data. You can include breaks where the user can recover and where you can remind him/her to perform tasks as quickly/precisely etc. as possible.

A single session has the following steps, each of which you should carefully plan beforehand. Any deviation from your plan may make your experiment invalid.

  1. Arrival
  2. Briefing (instructions)
  3. Pre-session questionnaire [optional]
  4. Training [optional]
  5. Task execution (can have multiple rounds, includes breaks and intermediate questionnaires)
  6. Post-session questionnaire [optional]
  7. Debriefing (e.g. payment)

Arrival

Welcome the user and make sure that the environment is prepared and without distractions.

Briefing/Instructions

Some points to consider for the instructions:

Pre-session questionnaire

Usually, it is important to ask the user for background information:

Part of this information must be reported in a publication (e.g. the gender proportion and the age range). Other information may be used in the analysis to remove participants (e.g. with too little or too much prior experience) and to perform a comparative analysis (female vs male, experts vs novices).

Training

Depending on the complexity of your interface it can make sense to give the user some time to get used to the system before you start measuring performance. Obviously, this does not apply if your interface is quite common (web site testing) or if the whole point of your study is learnability.

12.2.3 Dependent vs. independent variables

Dependent variable:
what you measure

Independent variable:
what you manipulate (what you compare)

12.2.4 Types of data

12.2.5 Within-subjects vs. between-subjects

within-subject study:

between-subject study

12.2.6 Counterbalancing

Problem: Order of task performance can impact results

Solution: counterbalance

12.3 Objective Measures

Here are five frequently used metrics to measure the usability of a system:

12.3.1 Task success (effectiveness)

Can the task be achieved at all?

12.3.2 Time-on-task

How long does it take?

12.3.3 Errors

How many errors occur? How safe is the interface?

12.3.4 Efficiency

How often per time unit does one succeed?

12.3.5 Learnability

How easily/quickly do I learn to use the interface?

12.4 Subjective User Experience

It is hard to measure the user's subjective impression of a system: how intuitive or natural was the interface or how much fun did he or she have using the system?

What methods are there to elicit this kind of information in such a way that we can analyze the resulting data?

12.4.1 Questionnaire

How to ask

Rating with Lickert scale

Rating with a Semantic differential scale

Standardized questionnaires

There are a number of standardized tests. It is highly advisable to use these tests or to look at them and learn about good wording and answer methods:

12.4.2 Microsoft
 Product Reaction Cards

This is a method developed by Microsoft.