如何進行啟發式評估

原文:Jakob Nielsen "How to Conduct a Heuristic Evaluation"
簡體中文譯者:初心不忘「如何进行启发式评估

在整理這份文件時,越發覺得把「Heuristic」在此譯為「啟發式」弊多於利。今天討論一個案子時,阿亮也跟我提了相同的事情。阿亮已經整理了他的意見,我引用如下:

heuristics usability evaluation 並非實徵的使用性評估,而依賴於專家根據某些已驗證過的經驗法則進行評估。因此翻譯應該使用以上意義,而非坊間常見的「啟發性評估」。

阿亮傾向使用「捷徑」或「便捷式」,我覺得好像還是可能誤解為「比較簡便的」,而讓誤讀者偏離原先「經由前人經驗法則下手,引起思索與討論」的意思。我們查了一下,這個字網路上還有認知心理學用的譯詞:「捷思」,指大腦為了認知世界所演化出的一些經驗法則,例如,我們認為遠的東西看起來就比較小,這在大腦裡會自動反應、無須思考)。這些經驗法則並非在任何情況下都正確,但能減少思考負擔。這個詞還不錯,雖然比較難看懂,但也因此更不容易造成誤用(一看就覺得是專有名詞 :P)。

在此我仍暫以「啟發式」來整理,等整理完畢後再就我的理解提出我認為比較適合的譯詞。


啟發式評估(Nielsen and Molich, 1990; Nielsen 1994)是使用性工程的一種方法,目的是為了找出使用界面設計中的使用性問題,因此啟發式評估能夠被當成是迭代循環設計流程過程中的一部分。啟發式評估是指少數幾個評估者檢查界面,並判斷界面是否符合公認的使用性原則(即「Heuristic」)。

大體上來說,很難僅讓一個人來做啟發式評估,因為單獨一個人永遠也不可能找出界面中的所有使用性問題。還好,從許多不同專案得到的經驗看來,不同的人會發現不同的使用性問題。 因此,有了許多評估者,就可能大大提升啟發式評估的效果。 圖 1 顯示了啟發式評估研究的一個實例, 19 個評估者發現了語音回應系統的 16 個使用性問題(Nielsen, 1992)。圖 1 中每一個黑色的方框代表一個評估者所發現的一個使用性問題。該圖清楚地顯示出,不同的評估者發現的問題有相當大的部分並不重覆。很顯然,有的使用性問題很容易被發現,幾乎每個評估者都找出了這類的問題:但是,也有一些問題只被少數評估者所發現。並且,不能僅僅依靠評估者的發現來判斷他們是否是好的評估者。首先,不能保證評估者在每一次評估中都是最適合的評估者;其次,有一些很難被發現的使用性問題,是那些沒有發現許多使用性問題的評估者發現的(可見圖 1 中最左邊的一欄)。因此,沒有必要在每次評估中都包括多個評估者(下文將會討論評估者的最佳人數)。我的建議是使用 3 至 5 名評估者,因為即使增加更多的人數,也難以獲得更多額外的資訊了。

heuristic_matrix.gif
圖一

圖 1 顯示了每個評估者在評估銀行系統中發現的使用性問題。橫排表示19個評估者,豎列表示16個使用性問題。方框代表使用性問題,黑色代表評估者發現了該問題,而白色則代表評估者沒有發現該問題。發現使用性問題最多的評估者在最底一排,發現使用性問題最少的評估者在最上面一排;最容易發現的使用性問題顯示在最​​右邊,而最難被發現的使用性問題顯示在最​​左邊。

在進行啟發式評估時,每個評估者各自獨立檢查界面;所有評估者都完成評估後才能交談,並統整他們的發現。這個程序非常重要,能是確保每個評估者皆能獨立、不受干擾地評估。評估結果可以由評估者自己手寫紀錄,或者亦可於評估時口述由觀察者紀錄。手寫報告看起來較為正式,但評估者需要花更多額外的精神,且最後也需要由一位評估者閱讀、統整所有人的報告。配置觀察者會增加每次評估的花費,但是評估者就能少花點精神;此外,統整的報告也能在最後一次評估結束後更快產出,因為觀察者要整理、理解的筆記都是他自己寫的,不需要整理其他人寫的報告;又,當界面出現問題,例如評估不穩定的原型、評估者專業知識不足、又或者需要解釋界面,觀察者可以幫助評估者。

在使用者測試的情景下,觀察者(通常稱測試員)有責任解釋用戶的行為,以推斷這些行為如何與界面設計的使用性問題相關。這麼一來,即使使用者完全不知道如何使用界面,也能進行測試。與之相對,在啟發式評估中,分析使用界面是評估者的職責,因此觀察者只需要記錄評估者關於界面的評論,而不需要去解釋評估者的行為。


目前整理到此


另外兩個啟發式評估與傳統用戶測試的不同是觀察者回答評估者問題的意願和評估者在操作界面時能夠獲得的提示的程度。 對於傳統用戶測試來說,觀察者通常想發現用戶在使用界面時發生的錯誤;因此,除了絕對需要的東西之外,實驗者不太願意提供更多的幫助。 同樣,要求用戶通過使用系統來發現他們問題的答案而不是從實驗者那裡獲得答案。 而對於一個領域特殊性應用的啟發式評估,拒絕回答評估者關於此領域的問題是不現實的,尤其是評估者中沒有領域專家。 相反,回答評估者的問題能夠讓他們更好的評估用戶界面的使用性問題。 類似的,當評估者在使用界面遇到問題時,實驗者可以告訴他們如何進行下去以便不在與界面的機制進行鬥爭時浪費寶貴的評估時間。 很重要的一點需要指出,不需要給評估者幫助除非他們明確其困難或者提出關於使用性問題的疑問。

Two further differences between heuristic evaluation sessions and traditional user testing are the willingness of the observer to answer questions from the evaluators during the session and the extent to which the evaluators can be provided with hints on using the interface. For traditional user testing, one normally wants to discover the mistakes users make when using the interface; the experimenters are therefore reluctant to provide more help than absolutely necessary. Also, users are requested to discover the answers to their questions by using the system rather than by having them answered by the experimenter. For the heuristic evaluation of a domain-specific application, it would be unreasonable to refuse to answer the evaluators' questions about the domain, especially if nondomain experts are serving as the evaluators. On the contrary, answering the evaluators' questions will enable them to better assess the usability of the user interface with respect to the characteristics of the domain. Similarly, when evaluators have problems using the interface, they can be given hints on how to proceed in order not to waste precious evaluation time struggling with the mechanics of the interface. It is important to note, however, that the evaluators should not be given help until they are clearly in trouble and have commented on the usability problem in question.

通常,一次啟發式評估需要持續1到2小時。 對於較大和較複雜的界面來說,更長的評估是必要的,但是最後將評估分成若干次進行,在每個小評估中只關注界面的一部分問題。

Typically, a heuristic evaluation session for an individual evaluator lasts one or two hours. Longer evaluation sessions might be necessary for larger or very complicated interfaces with a substantial number of dialogue elements, but it would be better to split up the evaluation into several smaller sessions, each concentrating on a part of the interface.

在評估過程中,評估者數次仔細檢查界面和不同的對話( dialogue element )並將其與公認的使用性原則比較。 這些啟發式原則是描述可用界面通用屬性的一般規則。 另外,評估者除了要參考通用的啟發式原則外,他們也需要考慮其他的與特殊的對話相關的使用性原則或者結果。 並且,有必要開發出類別特殊性的啟發式原則以適應特殊的產品。 一個建立補充的啟發式原則的方法是進行競爭性分析並在已經存在的類別下進行用戶測試以試圖抽像出可以解釋使用性問題的原則( Dykstra, 1993 )。

During the evaluation session, the evaluator goes through the interface several times and inspects the various dialogue elements and compares them with a list of recognized usability principles (the heuristics). These heuristics are general rules that seem to describe common properties of usable interfaces. In addition to the checklist of general heuristics to be considered for all dialogue elements, the evaluator obviously is also allowed to consider any additional usability principles or results that come to mind that may be relevant for any specific dialogue element. Furthermore, it is possible to develop category-specific heuristics that apply to a specific class of products as a supplement to the general heuristics. One way of building a supplementary list of category-specific heuristics is to perform competitive analysis and user testing of existing products in the given category and try to abstract principles to explain the usability problems that are found (Dykstra 1993).

原則上,評估者自己決定如何評估界面。 然而,一般的建議是,他們至少檢查界面兩次。 第一次的目的是獲得交互流程的總體感覺和對系統的總體了解。 第二次,評估者需要集中在詳細的界面,以了解它們是如何適合更大的整體。

In principle, the evaluators decide on their own how they want to proceed with evaluating the interface. A general recommendation would be that they go through the interface at least twice, however. The first pass would be intended to get a feel for the flow of the interaction and the general scope of the system. The second pass then allows the evaluator to focus on specific interface elements while knowing how they fit into the larger whole.

因為評估者不用系統完全真實的任務,所以,在紙面上來評估用戶界面是可行的( Nielsen, 1990 )。 這使得啟發式評估能夠在使用性工程生命週期的初期就能夠適用。

Since the evaluators are not using the system as such (to perform a real task), it is possible to perform heuristic evaluation of user interfaces that exist on paper only and have not yet been implemented (Nielsen 1990). This makes heuristic evaluation suited for use early in the usability engineering lifecycle.

如果系統是為大部分人群設計的“走過來即用的” ( walk-up-and-use )或者評估者是領域專家的話,在評估者使用系統的時候不提供進一步的幫助。 如果系統是領域依賴性的或者評估者在系統所在的領域內完全是新手的話,向評估者提供幫助有利於幫助他們使用系統。 一個有效的方法是向評估者提供一個典型的使用情景,列出用戶在完成現實任務時會採用的不同步驟。 這一情景可以在對實際用戶的任務分析和他們工作的基礎上建立。

If the system is intended as a walk-up-and-use interface for the general population or if the evaluators are domain experts, it will be possible to let the evaluators use the system without further assistance. If the system is domain-dependent and the evaluators are fairly naive with respect to the domain of the system, it will be necessary to assist the evaluators to enable them to use the interface. One approach that has been applied successfully is to supply the evaluators with a typical usage scenario, listing the various steps a user would take to perform a sample set of realistic tasks. Such a scenario should be constructed on the basis of a task analysis of the actual users and their work in order to be as representative as possible of the eventual use of the system.

使用啟發式評估得到的結果是一個使用性問題的列表,與之相關的是評估者需要指出這些問題違背了哪些使用性原則。 僅僅是簡單地說他們不喜歡什麼是不夠的,他們需要參考啟發式原則解決為什麼他們不喜歡。 評估者需要盡可能的詳細並且單獨列出每一條使用性問題。 例如,如果在一個對話上有三個錯誤,這三個錯誤需要針對不同的使用性原則分別列出,並解釋為什麼每一個部分存在使用性問題。 單獨記錄使用性問題有兩個原因:其一,在一個對話上,存在重複問題的風險,即使完全重新設計界面,除非人們已經完全了解所有的問題;其二,在一個界面不可能解決所有的使用性問題或者通過全新的設計來替換這些問題,但是如果我們知道了所有的問題,解決其中的一些問題是可能的。

The output from using the heuristic evaluation method is a list of usability problems in the interface with references to those usability principles that were violated by the design in each case in the opinion of the evaluator. It is not sufficient for evaluators to simply say that they do not like something; they should explain why they do not like it with reference to the heuristics or to other usability results. The evaluators should try to be as specific as possible and should list each usability problem separately. For example, if there are three things wrong with a certain dialogue element, all three should be listed with reference to the various usability principles that explain why each particular aspect of the interface element is a usability problem. There are two main reasons to note each problem separately: First, there is a risk of repeating some problematic aspect of a dialogue element, even if it were to be completely replaced with a new design, unless one is aware of all its problems. Second, it may not be possible to fix all usability problems in an interface element or to replace it with a new design, but it could still be possible to fix some of the problems if they are all known.

啟發式評估不能產生一個系統的方法來解決使用性問題或者評估重新設計的大概質量。 但是,因為啟發式評估的目的是在參照使用性原則的基礎上解釋每個發現的使用性問題,所以產生一個修正的設計是相當容易的。 另外,很多使用性問題只要被發現是很容易被解決的。 例如,如果問題是用戶不能將信息從一個窗口復製到另一個窗口,那麼解決方案很顯然就是加入一個複制的功能。

Heuristic evaluation does not provide a systematic way to generate fixes to the usability problems or a way to assess the probable quality of any redesigns. However, because heuristic evaluation aims at explaining each observed usability problem with reference to established usability principles, it will often be fairly easy to generate a revised design according to the guidelines provided by the violated principle for good interactive systems. Also, many usability problems have fairly obvious fixes as soon as they have been identified.

For example, if the problem is that the user cannot copy information from one window to another, then the solution is obviously to include such a copy feature. Similarly, if the problem is the use of inconsistent typography in the form of upper/lower case formats and fonts, the solution is obviously to pick a single typographical format for the entire interface. Even for these simple examples, however, the designer has no information to help design the exact changes to the interface (e.g., how to enable the user to make the copies or on which of the two font formats to standardize).

擴展啟發式評估以獲得設計建議的一個方法是在最後的評估結束後,進行一個詢問環節( debriefing session )。 參與者包括所有的評估者、觀察者及設計者代表。 詢問環節主要以頭腦風暴的方式進行,關注點在於討論主要使用性問題和設計上的一般問題並給出修改建議。 詢問也是討論設計積極方面的一個良好機會,儘管啟發式評估並沒有強調這一點。

One possibility for extending the heuristic evaluation method to provide some design advice is to conduct a debriefing session after the last evaluation session. The participants in the debriefing should include the evaluators, any observer used during the evaluation sessions, and representatives of the design team. The debriefing session would be conducted primarily in a brainstorming mode and would focus on discussions of possible redesigns to address the major usability problems and general problematic aspects of the design. A debriefing is also a good opportunity for discussing the positive aspects of the design, since heuristic evaluation does not otherwise address this important issue.

啟發式評估明確地被看作是一種“打折使用性工程”( discount usability engineering )的方法。 獨立研究( Jeffries et al., 1991 )證實啟發式評估是一種非常有效的使用性方法。 我的一個案例研究發現,一個啟發式項目的利益成本比率為 48 :使用該方法的成本為 10 , 5000 美元,而預期收益約為 500 , 000 美元( Nielsen, 1994 )。 作為一種打折使用性工程方法,啟發式評估不能確保提供最完美的結果或者發現一個界面的所有使用性問題。

Heuristic evaluation is explicitly intended as a "discount usability engineering" method. Independent research (Jeffries et al. 1991) has indeed confirmed that heuristic evaluation is a very efficient usability engineering method. One of my case studies found a benefit-cost ratio for a heuristic evaluation project of 48: The cost of using the method was about $10,500 and the expected benefits were about $500,000 (Nielsen 1994). As a discount usability engineering method, heuristic evaluation is not guaranteed to provide "perfect" results or to find every last usability problem in an interface.

(這邊還有一段簡體中文版沒有翻譯,我將找時間再處理)

Determining the Number of Evaluators

In principle, individual evaluators can perform a heuristic evaluation of a user interface on their own, but the experience from several projects indicates that fairly poor results are achieved when relying on single evaluators. Averaged over six of my projects, single evaluators found only 35 percent of the usability problems in the interfaces. However, since different evaluators tend to find different problems, it is possible to achieve substantially better performance by aggregating the evaluations from several evaluators. Figure 2 shows the proportion of usability problems found as more and more evaluators are added. The figure clearly shows that there is a nice payoff from using more than one evaluator. It would seem reasonable to recommend the use of about five evaluators, but certainly at least three. The exact number of evaluators to use would depend on a cost-benefit analysis. More evaluators should obviously be used in cases where usability is critical or when large payoffs can be expected due to extensive or mission-critical use of a system.
heur_eval_finding_curve.gif
Curve showing the proportion of usability problems in an interface found by heuristic evaluation using various numbers of evaluators. The curve represents the average of six case studies of heuristic evaluation.

Nielsen and Landauer (1993) present such a model based on the following prediction formula for the number of usability problems found in a heuristic evaluation:

ProblemsFound(i) = N(1 - (1-l)i )

where ProblemsFound(i) indicates the number of different usability problems found by aggregating reports from i independent evaluators, N indicates the total number of usability problems in the interface, and l indicates the proportion of all usability problems found by a single evaluator. In six case studies (Nielsen and Landauer 1993), the values of l ranged from 19 percent to 51 percent with a mean of 34 percent. The values of N ranged from 16 to 50 with a mean of 33. Using this formula results in curves very much like that shown in Figure 2, though the exact shape of the curve will vary with the values of the parameters N and l, which again will vary with the characteristics of the project.

In order to determine the optimal number of evaluators, one needs a cost-benefit model of heuristic evaluation. The first element in such a model is an accounting for the cost of using the method, considering both fixed and variable costs. Fixed costs are those that need to be paid no matter how many evaluators are used; these include time to plan the evaluation, get the materials ready, and write up the report or otherwise communicate the results. Variable costs are those additional costs that accrue each time one additional evaluator is used; they include the loaded salary of that evaluator as well as the cost of analyzing the evaluator's report and the cost of any computer or other resources used during the evaluation session. Based on published values from several projects the fixed cost of a heuristic evaluation is estimated to be between $3,700 and $4,800 and the variable cost of each evaluator is estimated to be between $410 and $900.

The actual fixed and variable costs will obviously vary from project to project and will depend on each company's cost structure and on the complexity of the interface being evaluated. For illustration, consider a sample project with fixed costs for heuristic evaluation of $4,000 and variable costs of $600 per evaluator. In this project, the cost of using heuristic evaluation with i evaluators is thus $(4,000 + 600i).

The benefits from heuristic evaluation are mainly due to the finding of usability problems, though some continuing education benefits may be realized to the extent that the evaluators increase their understanding of usability by comparing their own evaluation reports with those of other evaluators. For this sample project, assume that it is worth $15,000 to find each usability problem, using a value derived by Nielsen and Landauer (1993) from several published studies. For real projects, one would obviously need to estimate the value of finding usability problems based on the expected user population. For software to be used in-house, this value can be estimated based on the expected increase in user productivity; for software to be sold on the open market, it can be estimated based on the expected increase in sales due to higher user satisfaction or better review ratings. Note that real value only derives from those usability problems that are in fact fixed before the software ships. Since it is impossible to fix all usability problems, the value of each problem found is only some proportion of the value of a fixed problem.

heuristic_cost_benefit.gif
Curve showing how many times the benefits are greater than the costs for heuristic evaluation of a sample project using the assumptions discussed in the text. The optimal number of evaluators in this example is four, with benefits that are 62 times greater than the costs.

Figure 3 shows the varying ratio of the benefits to the costs for various numbers of evaluators in the sample project. The curve shows that the optimal number of evaluators in this example is four, confirming the general observation that heuristic evaluation seems to work best with three to five evaluators. In the example, a heuristic evaluation with four evaluators would cost $6,400 and would find usability problems worth $395,000.

參考文獻:

  • Dykstra, DJ 1993. A Comparison of Heuristic Evaluation and Usability Testing: The Efficacy of a Domain-Specific Heuristic Checklist . Ph.D. diss., Department of Industrial Engineering, Texas A&M University , College Station , TX .
  • Jeffries, R., Miller, JR, Wharton, C., and Uyeda, KM 1991. User interface evaluation in the real world: A comparison of four techniques. Proceedings ACM CHI'91 Conference ( New Orleans , LA , April 28-May 2), 119-124.
  • Molich, R., and Nielsen, J. (1990). Improving a human-computer dialogue, Communications of the ACM 33 , 3 (March), 338-348.
  • Nielsen, J. 1990. Paper versus computer implementations as mockup scenarios for heuristic evaluation. Proc. IFIP INTERACT90 Third Intl. Conf. Human-Computer Interaction ( Cambridge , UK , August 27-31), 315-320.
  • Nielsen, J., and Landauer, TK 1993. A mathematical model of the finding of usability problems. Proceedings ACM/IFIP INTERCHI'93 Conference ( Amsterdam , The Netherlands , April 24-29), 206-213.
  • Nielsen, J., and Molich, R. (1990). Heuristic evaluation of user interfaces, Proc. ACM CHI'90 Conf. ( Seattle , WA , 1-5 April), 249-256.
  • Nielsen, J. 1992. Finding usability problems through heuristic evaluation. Proceedings ACM CHI'92 Conference ( Monterey , CA , May 3-7), 373-380.
  • Nielsen, J. (1994). Heuristic evaluation. In Nielsen, J., and Mack, RL (Eds.), Usability Inspection Methods. John Wiley & Sons, New York , NY .
Creative Commons License
本站文字除特別聲明者外,係採創用 CC 姓名標示-非商業性-相同方式分享 2.5 台灣授權條款授權,利用前請見說明