Sunday, April 11, 2010

IR Evaluation

How do you evaluate a search engine such as google or bing? There is a field called IR (Information Retrieval) Evaluation that addresses this topic.

There can be different levels of evaluation. For example, 1) performance, 2) result effectiveness, 3) user satisfaction. IR evaluation discusses result effectiveness.

For a given search query, an IR system returns a ranked list of ordered candidates. We need to evaluate this ranked list of candidates. The criteria for relevance of the ranked list can be measured by two factors:

1) Precision: fraction of retrieved results that are relevant
2) Recall: fraction of relevant results that are retrieved.

There is a trade-off between precision and recall. Van Rijsbergen's F-measure expresses this relationship for a single query:

F(j) = (1 + b^2) / ( b^2/recall(j) + 1/precision(j) )

Other measures include those such as reciprocal measure, error, utility etc. Stability of measure can be an issue since some measures change w.r.t. time.

Reference:
- IR evaluation lecture notes
- Evaluation in IR

No comments:

Blog Archive

Followers