The main idea is to compute and compare precision and recall for two different similarity algorithms according to two different measures of relevance. For more background, see Belew Chapter 4 and the Bellew & Hatton RAVE paper in the reader.
The queries that you assigned relevance judgements to in Assignment 5 have been run through a program called match that ranks the documents according to similarity to the query using a cosine measure and tf.idf weighting (from section 3.4.4 of the Belew book, page 340. Short query weighting is used on short queries, long query weighting on long queries). One version of the algorithm normalizes the document length as part of the ranking, the other doesn't do the normalization.
Relevance judgements for all the students have been pooled to create a single relevance assessment for each query,document pair. Two types of relevance have been assigned:
Conservative relevance is defined as follows:
abrandt 09/15/98 05:52:03 3 4386 (6 1) (1 0) (9 3)with the format username date time feedback where "feedback" is of the form:
response# document# (query# rating) (query# rating) (query# rating)
Most people produced relevance judgements for only three queries.
You'll also need to make use of some other sets of files.
The file
/groups/is202/handouts/queries
shows the original queries and their query ids.
In the directory
/groups/is202/handouts/reljudgements/
you'll see two types of files. The files rels.conservative.queryid
show all of the documents that were looked at for
queryid. They also show the pooled conservative relevance judgement
for the query, document pair. Similarly, the files
rels.generous.queryid show all of the documents and their judgements
according to the generous relevance assumption.
In all these files, 1 means relevant, 0 means nonrelevant.
In the directory
/groups/is202/handouts/rankings/
you'll see four kinds of files. These contain the top 30 documents
according to two different ranking algorithms:
The files in this directory correspond to the following information:
The documents (along with their document numbers) can be viewed, as
before, in the files:
/groups/is202/handouts/data/ait/ait*.t
Another way to see the documents that correspond to the results of the query is to type the query into the interface at: http://corea.ucsd.edu:8000 This shows the ranking produced by the match algorithm and a hyperlink to the document itself. The text of the original queries can be found in the queries file.
Computing Precision and Recall
Choose one short query (from queries 1-8) and one long query (from
queries 9-11, where an abstract was used as a query)
that you produced relevance judgements for.
For both of these queries, do the following:
(1) State which query it is (number and the terms used if one of the short queries).
(2)
(3)
(4) For this question, choose recall levels that are evenly spread out, ending with the largest recall score possible given that rankings for only the top 30 documents are supplied. For example, it would be interesting to see precision at 20% recall, 40% recall, 60% recall, 80% recall, and 100% recall (but this might not be possible given the data).
(a) Compute the precision at five different recall points for the normalized matching algorithm using the pooled conservative judgements.
(5) Discuss how and why the plots in (4e) are similar to or different from one another.
(6) Discuss whether or not you think precision and recall are effective means to evaluate the ranking of this query.
Last modified Nov 10, 1998 MAH