trycycle/IIR-metrics.md

## IIR-metrics.md

      
    Raw
  

              IIR-metrics.md
            
          
    情報検索タスクにおける評価指標

情報検索タスクに関するユーザ実験として，さまざまな評価指標が用いられている．実験を設計する度にどんな指標を用いるべきかを迷うことがあるので，代表的なものをここにまとめておく．
ランキング指標

「情報アクセス評価方法論（酒井哲也（著））」にあたれば間違いありません．

適合率（precision）
再現率（recall）
F値（F-measure）
平均適合率の平均（MAP: mean average precision）
MRR: mean reciprocal rank
nDCG: normalized discounted cumulative gain）
alpha-nDCG

ユーザ行動指標

研究においてどのような概念を捉えようとしたのかを正確に伝えられるよう，どの指標も「名義的定義」と「操作的定義」の両方をきちん定義することが重要．
時間


検索セッション時間（session dwell time）
検索結果一覧ページ閲覧時間（dwell time on SERP, SERP dwell time）
（SERPに掲載された）ウェブページの総閲覧時間（total dwell time on pages/documents, page dwell time）
平均ウェブページ閲覧時間（avg. dwell time per page）
最初のクリックまでの所要時間（average time taken until first click/time to first click）

閲覧数


（SERPに掲載された）ウェブページの閲覧数（#clicked results/total document click count）
適合文書のクリック数（total relevant document click count）
クリックされたウェブページのドメインの割合（distribution of URLs over top-level domains）

クエリ


クエリ発行数（number of queries）
クエリ長（query length）

クエリ文字数（query length in number of characters）
クエリ単語数（query length in number of terms）


順位


最小クリック深度（highest clicked results position）
最大クリック深度（lowest/deepest clicked results position）
平均クリック深度（avg. clicked results position）

その他


文書にマウスカーソルを合わせた数（total document hover count）
適合文書にマウスカーソルを合わせた数（total relevant document hover count）
検索結果に掲載されたウェブページを閲覧しなかったセッションの割合（Sessions without click (%)）
文書の保存数（total documents saved）
適合文書の保存数（total relevant documents saved）

アンケート項目

タスク


情報検索の結果として得られた知識量

この検索トピックについて，どの程度知ることができましたか？
How much did you learn about this topic? (1: nothing - 5: a lot)


検索トピックに対する関心度

この検索トピックについて，どの程度興味を持ちましたか？
How interesting was this topic? (1: not at all - 5: very)


検索タスクの難易度

この検索タスクはどの程度難しかったですか？
How difficult was this task to complete? (1: very easy - 5: very difficult)


タスク結果（意思決定）に対する自信

あなたは自分が行った意思決定にどの程度自信がありますか？
I was confident in my decisions (1: strongly disagree - 5: strongly agree)


タスクの楽しさ

あなたはタスクを楽しむことができましたか？
I enjoyed completing this task (1: strongly disagree - 5: strongly agree)


タスク達成度に対する満足度

あなたはタスクの達成度に満足していますか？
I was satisfied with my task performance (1: strongly disagree - 5: strongly agree)


タスクの疲労度

あなたはタスクを完了させるのにどの程度疲れましたか？
I felt tired when completing this task (1: strongly disagree - 5: strongly agree)


システム評価


有用性：利用者が指定された目標を達成する上での正確さおよび完全性

このシステムはタスクを達成する上で有用でしたか？
The system was useful to complete tasks (1: strongly disagree - 5: strongly agree)


効率性：利用者が指定された目標を達成する上での正確さおよび完全性に関連して費やした資源

このシステムはタスクを最小限の努力で完了する上で行う上で有用でしたか？


不快感（満足度）

このシステムを使っているとイライラしましたか？
The system was annoying (1: strongly disagree - 5: strongly agree)


使いやすさ

このシステムは使いやすかったですか？
The system was easy to use (1: strongly disagree - 5: strongly agree)


分かりやすさ

このシステムは分かりづらかったですか？
The system was confusing (1: strongly disagree - 5: strongly agree)


美的要素

このシステムは視覚的に美しいと思いましたか？
The system was aesthetically appealing (1: strongly disagree - 5: strongly agree)


退屈さ

このシステムを使うのは退屈でしたか？
The system was boring (1: strongly disagree - 5: strongly agree)


再利用への意思

このシステムをまた使いたいと思いますか？
The system was engaging (1: strongly disagree - 5: strongly agree)


デモグラフィック系


実験協力者の性別
年齢
大学における専攻
職業
コンピュータの使用経験
インターネットの使用経験
情報検索の経験
認知スタイル

包括的/分析的スタイル（whole-analytic style）：ユーザが情報を全体として処理するか，部分ごとに処理するか
聴覚型/視覚型スタイル（verbal imagery style）：ユーザが言葉によって学習するのか，空間的に学習するのか


参考資料


Diene Kelly（著）, 「インタラクティブ情報検索システムの評価」, 丸善出版, 2013.
酒井哲也（著）, 「情報アクセス評価方法論」, コロナ社, 2015.
Ford, Nigel, David Miller, and Nicola Moss. 2005. "Web Search Strategies and Human Individual Differences: Cognitive and Demographic Factors, Internet Attitudes, and Approaches." Journal of the American Society for Information Science and Technology  56 (7): 741–56.

各指標が用いられている論文例


Foulds, Olivia, Leif Azzopardi, and Martin Halvey. 2021. "Investigating the Influence of Ads on User Search Performance, Behaviour, and Experience during Information Seeking." In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval, 107–17. CHIIR ’21. New York, NY, USA: Association for Computing Machinery.
Wu, Zhijing, Mark Sanderson, B. Barla Cambazoglu, W. Bruce Croft, and Falk Scholer. 2020. "Providing Direct Answers in Search Results: A Study of User Behavior." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 1635–44. CIKM ’20. New York, NY, USA: Association for Computing Machinery.
Odijk, Daan, Ryen W. White, Ahmed Hassan Awadallah, and Susan T. Dumais. 2015. "Struggling and Success in Web Search." In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 1551–60. CIKM ’15. New York, NY, USA: Association for Computing Machinery.
Umemoto, Kazutoshi, Takehiro Yamamoto, and Katsumi Tanaka. 2016. "ScentBar: A Query Suggestion Interface Visualizing the Amount of Missed Relevant Information for Intrinsically Diverse Search." In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 405–14. SIGIR ’16. New York, NY, USA: Association for Computing Machinery.