Usability Testing for Remote Teams
In his famous article The Joel Test: 12 Steps to Better Code, Joel Spolsky describes the "Hallway Usability Test" as "where you grab the next person that passes by in the hallway and force them to try to use the code you just wrote." It sounds convenient, but many readers will immediately realize that this method is impossible in their environment. Together, we'll investigate how to perform internal and external tests as a remote team so that a lack of physical hallways does not impede usability testing.
Defining Usability Testing
As a working definition, the minimal usability test requires a user, an interface, a list of tasks, and an observer. The user interacts with the interface to perform the list of tasks while the observer notes the user's behavior, successes, and struggles. More formally, a laboratory usability test adds recording devices and possibly multiple observers to gather more data while providing users with a consistent environment to minimize external factors.
Teams perform usability tests to gain feedback from stakeholders on their application interface. When Spolsky writes about "us[ing] the code you just wrote," his wording reveals that usability tests are useful beyond clickthroughs of web apps. You might give a developer access to an API with documentation or have a customer talk to a help chatbot. Regardless of the application that you're testing and the testing environment, some guidelines for formal usability testing should help your process.
The user should be representative of your application's target audience. Beyond demographics, factors like the test subject's level of familiarity with the application, general technical proficiency, and visible or invisible disabilities should enter your consideration in evaluating the test conditions and results. While it is important to test your application with a diverse group of users, usability testing is expensive. Especially for agile teams and iterative projects, frequent tests with about 5 users (tested individually) should give you the biggest return on investment.
The observer should be minimally interactive for the duration of the test. Any cues from the observer can undermine the integrity of the test, as the user may behave differently than they would without the tester's input. Using recording devices on the user's keyboard, mouse, screen, face, and other sources, or using multiple observers with different areas of observation, may help with recording complete, accurate data.
The list of tasks should be constructed to emulate normal use of the system. With exceptions, testing edge cases, substantially varied inputs, system load, and other extreme aspects of the application is better left to quality assurance professionals. If your team uses agile or another story-based project management system, you may find your completed user stories a good starting point for constructing a path through the application for the test subject to follow.
In-Person Usability Testing
Having established the fundamentals of laboratory usability testing, we will successively remove the tools available to the test designer and consider how to still gain valuable insight into an application's user experience. Returning to the "Hallway Usability Test," we relax several of the guidelines of laboratory usability testing. As Spolsky explains, you begin a hallway usability test by pulling someone in from the hallway at the office. Almost universally, the co-worker that you get to help with a usability test will be more familiar with the application than an average user would be, and in most domains they will be more technical. You'll be testing on whatever localhost development server you have running, with you standing over their shoulder noting down their actions and reactions. The list of tasks will most likely be one element related to today's commit. These departures from laboratory-style usability testing aren't criticisms of informal testing, they're what make it valuable. Performing a few informal, internal tests will catch the most egregious issues, thus increasing the value of external tests. However, for a remote team, this valuable informal interaction must be digitized.
The Remote Hallway Usability Test
In their 2007 paper, Andreasen et al. found synchronous usability testing to be equally as effective as traditional laboratory testing. The paper described a formal method of usability testing that simulates the laboratory environment, but the user and observer are separated by an arbitrary distance and communicate using digital methods. However, the other factors of laboratory testing were left largely unchanged. The user was still in an observation room using provided equipment, and the observer was digitally present to guide the experiment. For remote teams, following general procedures using video conferencing technology in place of co-location should result in the same valuable outcomes as traditional laboratory-style testing. If the formal version of usability testing withstands the challenges of remote operation, it stands to reason that we can construct a remote corollary to the "Hallway Usability Test."
Two factors stay constant: your role as the observer and the list of tasks. However, identifying a user becomes more difficult. Part of the benefit of hallway usability testing is that if someone is walking around, you're probably not interrupting deep work with your testing request. Furthermore, a well-trafficked hallway will give you test subjects from outside your team. Every team has different rules of engagement for online communication, but I'll refer to GitLab's popular online handbook, which says "[internal chat application] messages should be considered asynchronous communication, and you should not expect an instantaneous response..." Thus, we can not rely on individual messages to drum up test subjects. Instead, consider establishing a channel for usability test requests where a post can queue up a couple of willing testers who are between tasks or meetings (for bonus points, consider calling the channel "hallway").
A remote interaction should provide the same context as a "Hallway Usability Test." The chat message should identify or link to the relevant section of the application and describe one or two tasks (but not provide instructions on how to complete them). A video call with screen share lets you observe the user and their computer, though you do lose insight into their keystrokes and mouse movements. Later, we'll discuss technologies for recording these inputs.
Now that we have a user, some tasks, and an observer, the user needs to be able to interact with the application. If the user can run your branch of the application themselves, that's great, but people outside of your engineering team might not have a development environment configured. In this case, you could make a practice of performing usability tests after code has been reviewed and pushed to a staging server, or your team could maintain a usability testing server if the frequency of tests merited the effort and expense. Alternately, some screen sharing software allows you to grant the test subject keyboard and mouse control so that they can interface with your code as it runs in your development environment.
In addition to an established process for soliciting reviews from outside of the engineering team, a good code review process should include gut-feel usability testing. Your reviewer running the updated application and performing the user story provides minimum validation of the application's usability. However, including other stakeholders in larger-scope testing can generate broader insight into the application's user experience.
Asynchronous Remote Usability Testing
For external tests, you may want to decouple the observers and the users for convenience or methodology. On asynchronous remote testing, Andreasen et al. wrote, "The asynchronous methods are considerably more time-consuming for the test subjects and identify fewer usability problems, yet they may still be worthwhile." The methods used in the paper attempted to emulate laboratory testing, but with self-reporting in place of an observer. Outlined below are a few alternate methods of asynchronous usability testing, some of which alleviate the responsibility of the user to monitor their own usage.
One long-standing form of usability testing is the humble survey. Surveys can be completed anywhere, anytime, and provide direct feedback from users to your most important questions. That said, surveys are easy to do wrong, often in subtle ways, to the extent that survey methodology is an entire branch of applied statistics with its own peer-reviewed journal. Usability surveys require careful construction to ensure that they generate actionable insights for developers and designers. When you write a usability survey, remember that the minimal usability test includes a list of tasks; similarly, a usability survey should focus on the user's desired actions and their ability to complete them using the interface.
If you don't want to rely on users' self-reporting, tools for monitoring users' clicks, time on page, mouse movements, and other metrics often used for marketing also allow for a kind of asynchronous remote usability observation. For example, you might be able to learn that an average user hovers over three dropdown menus in the navbar before making a click, indicating that the interface needs simplification. One major advantage of these tools is that they can be easily deployed with minimal marginal cost per observation, allowing you to gather usability data from a large subset of your users. However, I was careful to call these "observation" and "monitoring" tools. Any statistician will tell you that correlation does not prove causation. Similarly, the data that you get from passive monitoring without accompanying user feedback from a formal process is limited in its applicability.
To address this limitation, we turn to A/B testing, another marketing tool with a strong corollary for usability testing. In an A/B test, you change a single element of the application for a subset "B" of your users and measure any appreciable changes in your key metrics as compared to the "A" group using an unaltered application. To use this strategy for usability testing, define an outcome and see if the change you make in the content or structure of the application allows a user to reach the outcome more quickly, in fewer clicks, or with some other quantifiable improvement.
A successful A/B test should select randomly. If your "B" group has confounding characteristics (for example, all of the "B" users are from Canada where most of your users are American) then the integrity of the data can be compromised. However, when working with remote monitoring tools, you can gather data on large numbers of users at once, ensuring that both groups are representative samples of your user base.
Finally, regardless of what form of asynchronous remote usability testing or observation you choose, take care to implement the tests and associated policies in a manner that respects your user's privacy. Metrics like mouse movements and clicking patterns are surprisingly identifiable and revealing of personal data, especially when combined with other user data your application may store. These sorts of tests should be conducted after obtaining informed consent from affected users.
Conclusion
Usability testing helps teams of all kinds refine their application's user experience. While laboratory-style testing remains the gold standard for rigorous discovery, frequent informal testing with small groups of users can discover issues early and integrate seamlessly with the development process, especially for agile projects. For remote teams, ubiquitous chat and screen share technologies allow for online "Hallway Usability Tests" as a convenient form of informal testing. Developers of many types of applications can gain further usability insights by adapting marketing techniques like A/B testing in asynchronous remote usability tests. Regardless of the methods that you choose, gathering user feedback and iterating on their needs will improve application quality over time.