Evaluation
HomeConceptNeeds AssessmentDesignEvaluation
TraveLite
 
Paper Prototyping
Heuristic Evaluation
User Testing
Visual Design Experiment

Abstract

TraveLite is a web-based, customized travel guide publisher. It allows travelers to sort through the information available in the database and choose only what they decide they need or want and download that information to a PDA. In creating guides based on their interests and needs, travelers will have the opportunity to purchase their guide, rather than a static, bland product designed for a generalized perception of what a generic traveler in a region may need. One of the major design hurdles for the project, however, is how to support user queries over the vast amount of travel information that is available.

One of the primary tasks we will need to support in the prototype of TraveLite is building a guide online using a web-based interface. In order to build a guide, users will need to use some type of tool to sort through the large amounts of nominal and ordinal data available, and filter out the elements of interest specific to their needs. The search/filter task can be daunting given a large database of content, and the possibility for failed queries (0 hits) is high as more constraints are added to queries.

Traditional, form-based queries do not give the user any information about what the database can offer them and how their query constraints limit the data. When user goals are relatively fluid, it is helpful to have information about how loosening constraints will affect the returned set. For instance, if spending $5 per night more on a hotel room means twice as hotels to choose from, the user would like to have this information. It could be that using techniques that provide immediate feedback will allow the user to see how their constraints limit the data and will help them to make more informed decisions regarding what to collect for their customized guides.

In this experiment, we compared an interface that uses visualization, IBM's Visual Attribute Explorer, to two forms-based interfaces, one flat and one that dynamically shows query results. Tasks had to do with choosing a set of restaurants. We wanted to find out how the interfaces compared in terms of task completion time, quality of query results and user confidence. In addition, we collected user satisfaction information. We also wanted to determine whether textual training had any impact on success with the visualization interface.

We expect that a visualization tool such as the Visual Attribute Explorer will enable users to interact with the database in an intuitive manner that facilitates exploring, searching and selecting sets of data based on attributes relevant to the individual's travel needs.

In order to determine the appropriateness of using the Visual Attribute Explorer, we conducted an experiment to evaluate the comparative usability of the Visual Attribute Explorer. We compared the tool to a traditional, form-based query interface, as well as a dynamic query interface. The task was choosing a set of restaurants to include in a guide.

What follows is a summary of the experiment we conducted for more information, please refer to our write up for the courses: IS247 (Information Visualization & Presentation) and IS271 (Quantitative Research Methods)

Background

As the amount of information we deal with on a daily basis increases, we need easier ways to manage and filter that information. Visualization tools are a way to represent data that takes advantage of our visual cognitive skills. Humans can recognize and understand shapes and colors much faster than we can process text. Furthermore, dynamic query and visualization tools allow the user to manipulate datasets through a graphical interface. These systems all incorporate:

  • Rapid incremental and reversible operations with immediate visual feedback on each action
  • Smooth graphical feedback of results
  • Continual visual representation of the dataset
  • Physical based interaction with the data, usually using sliders or buttons, to allow the user to form and develop queries
  • Further details on demand
  • A layered approach to learning that allows both naïve and experienced users to use the tool
  • Eliminate the zero hits returned problem. If zero hits occur, user simply sets the results back to the previous stage.

A main benefit of these systems is that they enable the user to reduce the found set to a manageable size (based on desired attributes/constraints) and then allow for deeper exploration. Furthermore, the sense of actual control over the data and query process bolsters user confidence in the results of the search.

Experimental Method
Question: Is visualization useful for the task of selecting content to include in a customized travel guide, specifically for searching over a database of restaurant information? We would like to know whether the Visual Attribute Explorer is a useful tool for allowing the user to rapidly filter through combinations of attributes to select a section of the data they wish to further explore or keep. We would like to determine the usefulness of a visualization over this type of data before we invest time in a final implementation for deployment over the web. For this reason, we are using an alpha version of the Visual Attribute Explorer. A final implementation would involve a significant redesign of this software, but the experiment helped us to determine which aspects of the visualization were helpful to users, and which aspects were confusing.

In order to evaluate the Visual Attribute Explorer, we compared it to two versions of form based queries, one flat and one with dynamic feedback. Specifically, users performed three identical tasks across the same data set of restaurant information using three different user interfaces for filtering the data. Each task required the user to interact with three, six or nine attributes. Database content was taken from Lonely Planet and Zagats guides for San Francisco.

We used a 1 x 2 x 4 design, combining between and within subjects conditions. We tested each user across three interfaces (within subject design) and broke them into two groups based on the presence/absence of training (between subject design).

Dependent Variables:

Time to completion for user tasks:

This will allow us to determine which interface allowed the user to complete their task in the least amount of time.

In addition to task completion time, we recorded Exploration Time for the time users took to explore the Attribute Explorer interface prior to beginning their first task. We wanted to allow users to "poke around" the interface until they felt comfortable with it. Recording the variable served two purposes: first, we wanted to know how long it took people to feel comfortable enough with the interface before they began their first task, second, we wanted to know if success ( as measured by task completion time, recall, precision and confidence) was correlated with having spent more time exploring the interface.

Quality of results:

We collected Recall and Precision as a measure of result quality. We expect these rates to be high if users understand the interface since the user task is outlined explicitly, there is not much chance for error in formulating the query incorrectly. If, however, a user does not understand how to specify a query with the interface, these measures should be significantly affected.

Recall:
This measurement is commonly used to evaluate information retrieval systems. It shows how successful the system is at retrieving all the information that is relevant to a specific query. Since we can calculate what this measure should be ahead of time, it will be easy to compare a tester's query Recall to ideal Recall. (Recall = Retrieved Relevant Listings/All Listings Relevant to the Query Task). Example: Ratio of restaurants found to restaurants available for the specific filtering task.

Precision:
This measurement is commonly used to evaluate information retrieval systems. It is a measure of how successful the system is at retrieving only the information that is relevant to a specific query. Since we can calculate what this measure should be ahead of time, it will be easy to compare a given tester's query Precision to ideal Precision. (Precision = Relevant Retrieved Listings/All Listings Retrieved). Example: Percentage of restaurants found that meet all the criteria.

Confidence

Following each task, we asked users to rate their level of confidence in the result set returned by the interface.

User satisfaction with the tool:

Using a post-test survey or interview regarding user satisfaction with the tool, we asked the following questions:

  • Did you have a sense of what there was to choose from?
  • Did you have a sense of how your query constraints would affect the queries?
  • Are the above things that matter to you for this type of task?

We hoped that a visualization tool such at the Visual Attribute Explorer will enable users to interact with the database in an intuitive manner that facilitates exploring, searching and selecting sets of data based on attributes relevant to the individual's travel needs.

Tasks

We framed the experiment around four tasks. The first three tasks were straightforward query tasks where the user was given several attributes over which to search. Because we wanted to test the quality of results as the tasks grew more complex, we gave each user a three-, then six-, then 9-dimension task on each interface. To prevent the possibility of a learning effect regarding the contents of the query and results, we created three tasks for each complexity level and assigned a different set of three tasks to each interface for each user. In essence, each user performed all nine tasks, but we randomized the tasks so that each task-interface pair was equally distributed. See the Test Key for the assignment of tasks and interfaces to testers.

Each level of task was designed to be roughly equivalent in difficulty, so that, for example, all 6-dimension tasks had two Boolean attributes and four multiple choice attributes.

The fourth task was a free form query task where the user was asked to formulate a query, note the parameters of the query, and then record the changes made to the query as they began to use the tool.

Results

Quantitative Results

Testers

We tested the interface on thirteen testers. We kept the results from twelve of those tested. Testers were taken first from a pool of UC Berkeley students who had expressed interest in participating in interface testing. The second round of testers were masters and PhD students from the School of Information Management and Systems. The testers were all graduate students and ranged in age from 19 to 49. Half were men, half were women. There was a range of experience with visualization software. Three users did not know what visualization software was, six had used visualization software at least once, and two had heard of visualization, but had never used a tool. None of the users considered themselves experienced users of visualization tools.

TraveLite's intended audience is tech savvy. They are comfortable researching and purchasing online. They are interested in downloading software and content to a portable device. We do not, however, expect that they will be familiar with visualization tools. If anything, then, our testers were more experienced and skilled than we expect our customers to be. If the interfaces we tested were confusing to our testing audience, then we would certainly expect them to be as or more confusing to our commercial audience.

Results

Time

Overall, the differences in task completion time between the Attribute Explorer and the two forms-based interfaces were significant - Attribute Explorer took more than twice as long to complete.

As expected, we note that as task complexity increases within a particular interface, mean task time increases. We speculate that the improvement in task time between the Attribute Explorer three-dimensional and six-dimensional tasks is due primarily to the extreme variance located in the Attribute Explorer three-dimensional task mean, variance which lessens in subsequent tasks. It is important to note that the significance levels of the difference between the forms-based interfaces and the Attribute Explorer interface increase as the task complexity increases, indicating a widening gap in task time. Task complexity demonstrated a greater impact on task completion time when using the Attribute Explorer, with the mean task time increasing by a greater amount as task complexity increases. Measures of task time on the forms-based interfaces demonstrated less sensitivity to task complexity.

Confidence
Additional evidence that users were less effective using the Attribute Explorer is found in the subjective measure of confidence in the retrieved results for each query. Overall, users were less confident in the results retrieved using the Attribute Explorer interface than with either of the forms interfaces.


Task complexity also demonstrated a significant effect on measures of confidence in results. We see what may be a learning effect as users of the Attribute Explorer report greater confidence in results as they continue to use the interface while task complexity increases. It is interesting to note that the Dynamic Queries interface demonstrated practically no variance in confidence measures, indicating that all testers reported confidence in their results from the start.

Recall and Precision
Interface also demonstrated an effect on both recall and precision, indicating the Attribute Explorer interface impeded users' ability to specify queries accurately. Using the Attribute Explorer interface, measures of both recall (ratio of relevant retrieved items to all relevant items ) and precision (ratio of relevant retrieved items to all retrieved items) were 13% lower than with either of forms-based interfaces tested. While the difference in means between the interfaces for recall and precision is not statistically significant, it is noteworthy that there are differences between the interfaces. Because the tasks were so rigidly controlled (the tasks were designed to test the specification of queries, rather than the formulation of queries) problems with recall and precision are more likely effects of the user's understanding of the interface than their understanding of the queries. Perhaps a more telling number than the means of recall and precision are the raw number of errors committed on each interface. The number of imperfect scores using the Attribute Explorer was eight, the number in each other the forms interfaces was three. We think this signals a real problem for the Attribute Explorer's ability to assist the user in forming accurate queries returning useful results.

Effects of Training
Given that the Attribute Explorer is a less-familiar interface, we considered whether or not training would have an effect on users' performance or evaluation of the tool.

While it appears that training had an effect on the variance of the three-dimensional task time using Attribute Explorer, more stringent analysis demonstrated that the differences in task times between the training and non-training groups are not significant. It is interesting to note that aggregating exploration time plus the first 3-dimensional task results in a measure that is approximately equivalent regardless of training, indicating that a consistent amount of time using the interface may be required for initial comprehension. Likewise, the presence or absence of training demonstrated no effect on recall, precision, or confidence using the Attribute Explorer.

Post-interface evaluation
Upon completing three tasks with an interface and prior to moving on to the next one, users were asked to evaluate the interface overall by completing a short survey. Results from these surveys indicate that users preferred the Dynamic Queries interface overall, giving it the highest marks among the three interfaces on the measures of ease of understanding, usefulness, and expected results.

Attribute Explorer fared particularly poorly when compared to the other two interfaces on the measure of ease of understanding, supporting our contention that the Attribute Explorer interface was particularly confusing for users.


It is interesting to note that on the question of usefullness, users rated higher those interfaces that provided immediate feedback and more direct manipulation (Attribute Explorer and Dynamic Queries), indicating that these have significant value to users, despite the complications with the Attribute Explorer interface.

Qualitative Feedback

Post Interface and Made-up Task questionnaires

Attribute Explorer

On the Attribute Explorer, most users were very confused about the distinction between the select and deselect mechanism (to remove items from the found set, the user clicked on the criteria to deselect it). This was further compounded by the fact that most users were also confused by the color coding employed in the system. All found results appeared in white, while the deselected items were gray, with items successively darker as they matched more of the deselected criteria. Because the users were not initially comfortable with this functionality, they also spent more time in checking through their selections to ensure that their results were correct.

In the post-interface satisfaction survey, most participants felt that they developed a sense of the entire data set, although most responded that they were initially confused. For example, some individuals did not initially realize that the attributes appeared on more than one screen and others did not initially understand which graphical elements were important to their tasks, for example, one person responded that she felt it was easy to understand but "some details bothered me, such as the yellow line in the bar so it takes a while to know what is really important information." Furthermore, most of the participants responded that they did sense how changing their constraints affected the queries but again this took some time to understand, for example, one tester responded, "yes, once I figured out that you need to click on what you don't want (sorta counter-intuitive)."

With the made-up task on Attribute Explorer, again users were somewhat confused by the tool. In response to the effect the tool had on creating, expanding or limiting their search, one user wrote, "I saw I needed to be less restrictive, I knew that before selecting an attribute because I could see in advance there were no more white restaurants." However, most users did not feel that this visualization changed their query significantly. Most users indicated that they were satisfied with the results although only half were confident that they had found all the restaurants that matched their criteria. Basically, most users were simply confused by this interface and were not sure how to use it and/or how to understand the results.

Static Form

With the Static Form interface, most users could quickly perceive the available constraints to choose from, however a few found the long form and the amount of scrolling made it difficult to track all the possibilities. Most responded that they could somewhat understand how the constraints affected the query but would have preferred to have the feedback immediate or more easily accessible rather than having to return to the previous page. In fact, one participant responded that "this kind of task should be simple and quick, lack of immediate feedback made me less certain about what I'm doing. And lack of adequate visual cues made the form harder to navigate."

In the questions following the made-up task, the response to whether the tool affected their decision to change the query was fairly divided. Half the participants responded no and the other half did change their constraints to attempt to change their results but expressed a desire for more control while using the interface. One tester responded that 'I almost wanted to expand the query just so that I would feel like I was doing something more," while another replied that "This interface made it much more difficult to see the new results."

Dynamic Query Form

Participants liked the Dynamic Query interface the best out of the three. When first interacting with the tool, most users were pleasantly surprised and pleased when they realized that changing constraints immediately resulted in a changed results set on the same page. All users could sense what they could choose from, although some felt that it took some getting used to the interface. One participant replied that 'It took a little getting used to, but after trying a couple of things - especially combination check marks in a particular category and comparing answers with each answer set. I felt comfortable with the answer," while another said that "[I] especially like the way each choice would immediately affect the results in the frame." In asking whether users could sense how the query constraints affected the query, all users said yes and one noted that "I was also looking at - and found helpful - the debugging notes." [Note: we have since incorporated this facility into our design as a form of feedback]. Likewise, all users felt that both of these qualities mattered with the Dynamic Query interface. In fact, following using this interface, two users realized the importance of how the constraints affected the query, one replied that "yes - actually I think in the real world, I'll put more emphasis on the constraint of things more while conducting a real search task."

The Dynamic Query interface also changed how users searched when they had an opportunity to create their own task. One user, in a response typical of most users, replied that "it helped me play around with different possibilities." Another user explained that, "It made me see clearly what the effect of my adding or subtracting choices. I can use the information to know what is the real meaning." All users expressed satisfaction with the results although not all were as confident about finding all the possible results.

General Feedback ~ Post Test Questionnaire

Users were only somewhat sure about how the interfaces worked. The Attribute Explorer, as mentioned above, confused most users and a few also had a difficult time with the other two interfaces as well. Most users felt that their search did not affect which interfaces they found most useful although a few did say that the Dynamic Query form did help in altering their queries, especially on the more complex searches. One user felt that "The more constraints, the simpler the interface should be so I can focus on the search and not the additional [questions of]: is the tool working, am I making mistakes?"

In all three interfaces a few participants felt uncertain about the logic of the flow or placement of the attributes themselves making the tools difficult to use in a search. Some also felt that all constraints needed to fit on one page in order for them to easily understand the range of possibilities and also to avoid excessive scrolling. Most participants did not like Attribute Explorer. They felt that the interface had too many screens, the distinction between the two colors was confusing. They also disliked selecting attributes they wanted to omit, finding this action counter-intuitive. Some also disliked the Static Form interface because it did not show immediate feedback, how the restrictions affected the results.

In response to the question of whether the Attribute Explorer provided a sense of context or was just confusing, many did respond that it was confusing, "too much information or too much visual" in the words of one person. One person replied that "Histograms were ok. They're not very useful for searching (as opposed to browsing) where they are much more useful, as they would be for qualifying a search) for just executing a given task, the dynamic query is better." Another person felt that "it got easier as I used it." Also, in response to whether any of the interfaces frustrated them, almost half responded with the Attribute Explorer and some also felt frustrated with the Static Form for all the same reasons discussed above.

And last, one tester wrote that "the AE has real possibilities for comparison of attributes after results are retrieved," while another felt that the Attribute Explorer was useful "maybe for those who don't really know what they want and just want to wander around to see what is a good choice." Overall, 10 participants choose the Dynamic Query as their favorite interface, with one person choosing each the Attribute Explorer and Static Form.

Potential Implementation
  In an attempt to incorporate the best of these interfaces, we propose a potential iteration of the favored Dynamic Queries interface to include a pop-up window with an Attribute Explorer-like view of the data. Mockup

Advantages:

  • Allows individual users to opt-in/out of the Attribute Explorer-style view
  • Allows form-based selection of items, along with a dataset overview and query feedback/formulation help via an Attribute Explorer-style popup view
  • Labels in the AE-view give local feedback as to selected query constraints, appearing darker if constraining a query, in grey otherwise
Conclusion
  Conclusion

In conclusion, we have determined that using the Attribute Explorer for exploring our data set only confused the participants of our experiment. Of the three interfaces, the Dynamic Query form rated as the most popular interface. Users liked being able to see and understand how changes in their query constraints affected their results.

Users on the Attribute Explorer took significantly longer to complete comparable tasks, even with a period of "play time" prior to starting to the task.

From the results of our experiment, we have seen that both Precision and Recall were lower in the Attribute Explorer in comparison to the other two interfaces. We believe that confusion about the interface actually affected the quality of the query results, causing errors and hence lower values on these measures. Furthermore, users indicated a lower confidence in the results they retrieved from the Attribute Explorer than, again, in the other two interfaces. These measures indicate that users were not able to concentrate on the task at hand - finding restaurants - as they were too preoccupied with the interface itself.

The purpose of this experiment was to evaluate the usefulness of a visual tool in assisting users to find restaurants to include in a customized travel guide. We plan to deploy this application over the web and therefore the tool will need to be easy to understand and use by even the most novice user. This experiment was conducted with the participation of a group of relatively technically savvy individuals, many of whom have had some prior experience with similar visualization tools. Given that Attribute Explorer did not perform very well with this population, it is simply not feasible to deploy it as the primary means of interacting with our system for the average Internet user.

Although this visualization tool made it difficult for users to explore the data and understand the results from their queries, we do believe that the Attribute Explorer might be useful as a complementary view of the Dynamic Query form. By showing information about data density and how the changing the constraints affect the results, a histogram view can assist users in formulating, modifying and evaluating their queries. The user will receive quick feedback on their actions and also be able to perceive where their result set falls in the context of the entire dataset.


© copyright 2001 TraveLite. All rights reserved.
email: travelite@sims.berkeley.edu
Last modified: 02-May-2001