
|
by Tom Tullis and Stan Fleischman
Published in Second Quarter, 2002 Issue Back in 1997, we did some research on the readability of text in the Windows environment. Now we're turning our attention to readability issues on the Web.
Traditionally, the view has been that serif fonts, such as Times New Roman, are the better choice for printed pages while sans serif fonts, such as Verdana, are the better choice for online viewing. The argument has been that the serifs aid in letter recognition when the font is being shown at the high resolutions commonly used in print, but that they just add visual "noise" when the font is being shown on lower-resolution screens. Recent research at Wichita State University (1,2) has been somewhat inconclusive in determining whether there are any real differences in the readability of fonts online, at least for the commonly used fonts such as Times New Roman and Verdana. Consequently, we decided to conduct a study of those two fonts in particular, assessing their readability at different sizes when viewed on the Web. The Fonts We StudiedFor this study, we investigated three different sizes of Times New Roman and Verdana: Smallest, Medium, and Largest. Examples of all the conditions studied are shown in Figures 1 and 2.
Figure 1. Examples of the text passages (with typographical errors) in Verdana, at each of the three sizes studied.
Figure 2. Examples of the text passages (with typographical errors) in Times Roman, at each of the three sizes studied. The sizes were manipulated using I.E. 5.5's "View/Text Size" menu, which enables users to change the font sizes. For Times, the HTML specified SIZE=3 for the Medium size (which is the browser's default if no size is specified). The Smallest and Largest sizes were then generated using the corresponding menu selections of the browser. (This technique was chosen because we wanted to test the common range of sizes that are available to the user via this menu.) For Verdana, we chose HTML SIZE=2 for the Medium font size, primarily because we wanted a size that yielded characters approximately the same actual size as the Medium font for Times. (Verdana is a naturally larger font than Times.) Similarly, the Smallest and Largest conditions for Verdana were also generated using the browser menu. In all cases, the participants in the study were shown screen-shots of the text passages instead of using HTML to display the text. This was done to control for possible rendering differences due to a user's browser settings. Note that no scrolling (horizontal or vertical) was required in any of the conditions. How We Conducted the StudyThe study was conducted online and was open to all Fidelity employees. The incentive for participation was entry in a random drawing for a $50 gift check. Participants were asked to proofread six different text passages. Each passage (adapted from the Encarta online encyclopedia) was 130 words long and about a different animal. Nine typographical errors were inserted into each passage, although we told participants there could be from 1 to 10 errors. The errors were always substitution errors, in which one letter was replaced by an incorrect letter. The same set of nine substitution errors was used in each passage (e.g., "q" for "g", "b" for "d", "s" for "a", "v" for "w"). These substitutions were chosen so that their detection might be difficult due to the similarity of the letter shapes. The errors were distributed throughout each passage, but never at the beginning or end of a word, nor in any proper noun. Different combinations of errors, passages, and fonts were used during the study to minimize any effects of a specific error being hard to detect simply because of less familiarity with the word. Participants were instructed to read each passage "as quickly and accurately as possible" and to click an on-screen button for each typographical error they found. Figure 3 shows a sample of the test screen. Total time to read the passage was automatically recorded, as was the number of times the user clicked to indicate an error. Users were given no feedback about their accuracy. After reading each passage, they were asked to rate the effectiveness of that font style on a scale of 1 to 5, with higher being more effective.
Figure 3. Sample test screen used in the online study. Results1,108 employees participated in the study. One of the great advantages of online studies is that they allow for data collection from a very large number of participants. To our knowledge, this is the largest number of participants in any study of font legibility reported in the literature to date. Speed and AccuracyThe accuracy with which the users read each passage was determined by the number of errors they reported. Since each passage actually contained nine typos, reporting nine typos resulted in an error rate of 0%. Reporting eight or ten typos resulted in an error rate of 11%, seven or eleven typos resulted in an error rate of 22%, and so on. In analyzing the speed data (the time it took users to read each passage), we discarded any trials of less than 10 seconds (indicating they did not really read the passage) or longer than 300 seconds (indicating they got interrupted or fell asleep!). To get an overall sense of how quickly and accurately participants read the passages, the error data and speed data were combined using a z-score transformation, which is a way of combining measures on different scales to give equal weight to each. This results in an overall performance measure as shown in Figure 4 for each combination of font and size.
Figure 4. An equal-weighted combination of the speed and accuracy data. Performance improved with font size, although more so for Verdana. At the Smallest size, Times was significantly better than Verdana, but at the Largest size Verdana was significantly better than Times. Clearly, performance got better as the font size increased, which is not surprising. However, this effect was significantly more pronounced for Verdana than it was for Times. Interestingly, at the Smallest size, Times performed significantly better than Verdana, while the opposite was true at the Largest size. There was no difference between the two fonts at the Medium size. The finding that Times performed significantly better than Verdana at the Smallest size goes against the conventional wisdom often expressed in the usability and human factors literature, which states that sans serif fonts (such as Verdana) are generally more effective on-screen than serif fonts (such as Times), especially at the smaller sizes. The logic behind the statement is that the resolution of our monitors is generally too low to accurately represent the serifs, and that the attempt to render them simply adds visual noise to the text. (Likewise, the conventional wisdom has been that serif fonts perform better in print than sans serif fonts due to the much higher resolution of the printed page, which allows the serifs to be accurately rendered and thus aid in the recognition of the letters.) We believe that the reason for the better performance by Times at the Smallest size does in fact lie in the serifs used in that font, and that perhaps our monitors are now good enough to render the serifs sufficiently, even at the smaller sizes, to allow them to aid in recognition of the letters. For example, consider one of the pairs of substituted letters used in the study: "q" vs. "g". Figure 5 shows greatly enlarged versions of those letters in Times and Verdana at the Smallest size. It also shows those letters overlaid on top of each other to illustrate the differences between them. In the case of Verdana, there are only four pixels different between the two letters (the tail of the descender). In the case of Times, there are eighteen pixels different between the two letters, because of their vastly different shapes. The dominant theory of how people recognize letters is through a feature extraction process. In a simplified view, you can think of the pixels as corresponding to features of the letters. Consequently, there are simply more features to distinguish the "q" from the "g" in Times than there are in Verdana. Although this illustration is most striking with this particular pair of letters, it holds true to some degree for all of the substituted pairs used in the study, mostly because of the serifs in Times.
Figure 5. An illustration of why Times might have performed better than Verdana at the Smallest size. Enlarged views of the lower-case letters "q" and "g" are shown for each of the fonts. The two letters are also shown overlaid on top of each other, in which black represents pixels that are shared by both letters and gray represents the unique pixels. Times clearly has more unique pixels, thus making the letters more readily distinguishable. Now the question is why this performance advantage of Times diminishes as the font size gets larger, to the point where there is no difference between the fonts at the Medium size, and then Verdana is significantly better at the Largest size. Part of the explanation appears to lie in the fact that as a serifed font gets larger, the serifs take up a progressively smaller proportion of the letter's total image, thus creating less of a difference between the features of the letters in the two fonts. At the same time, another difference between Verdana and Times starts to emerge as the text gets larger. Times is rendered with significantly tighter kerning than Verdana, which simply means that the letters are closer together in Times. This is obvious from inspection of the Largest passages in Figures 1 and 2. Although the looser kerning with Verdana is true at all sizes, any advantage it offers at the Smallest size is apparently outweighed by the limited feature differences between the letters. Another way of thinking of this is that at the Largest size, basic legibility is no longer an issue with either font, so the feature differences between the fonts no longer play a role in affecting performance. Instead, a more global factor, the spacing between the letters, starts to play a more dominant role. Subjective RatingsIn addition to capturing performance data, participants in the study were asked to rate how effective they thought each combination of font and size was after reading each passage. These average ratings are shown in Figure 6.
Figure 6. Subjective ratings of the effectiveness of the various fonts and sizes on a 1 to 5 scale, where higher numbers are better. Verdana was rated significantly better at all sizes, although especially at the Smallest size. The Medium and Largest sizes were rated significantly better than the Smallest size. The subjective ratings for the Medium and Largest fonts were significantly better than the ratings for the Smallest fonts. However, there was no difference between the ratings for the Medium and Largest fonts. At all sizes, Verdana was rated significantly better than Times, but this was especially true at the Smallest size. These ratings are still another example of how users' subjective preference ratings do not always correspond to their performance data. This is particularly obvious at the Smallest size, where users significantly preferred Verdana over Times, but they performed significantly worse with Verdana. ConclusionsAlthough the results of this study were a bit unexpected, the following implications for the presentation of text on the Web appear to be supported by the data:
|
References
1. Bernard, M., & Mills, M. (2000). So, what size and type of font should I use on my website? Usability News 2.2 [Online]. http://psychology.wichita.edu/surl/usabilitynews/2S/font.htm
2. Bernard, M., Mills, M., Peterson, M., & Storrer, K. (2001). A comparison of popular online fonts: Which is best and when? Usability News 3.2 [Online] http://psychology.wichita.edu/surl/usabilitynews/3S/font.htm
Copyright 2002 FMR Corp. All Rights Reserved.