Clarifications on the Claims to Passage Task

From communications with participants here are some answers to questions you surely have about this task. If you have further questions, please let us know.

What is a 'passage'?

A 'passage' is any child element of the abstract, description, or of the claims. They could be <p> elements, but also <claim> or <heading> elements.
We are aware that headings are not particularly informative, but could not exclude them apriori if the portions of patent text indicated as relevant in the search reports covered them.

So a relevant passage to a given set of claims is one or more children elements of the abstract, description or claim tags. When the whole abstract, description, or claims are considered relevant, we chose to list all children of the corresponding XML elements. Participants should do the same.

Relevance judgements

How did we obtain them?

Manually, extracted from the search reports.

The qrels are a as-close-as-possible representation of the relevant portions of text indicated as relevant in the search reports. Look here for an example of an European Search Report.

Ranking relevance judgements

At this stage, we only have used X and Y level citations from the search reports. In the previous CLEF-IPs, such X/Y level citations were considered equally (highly) relevant in the 'Prior Art'-like tasks (level 2 in the qrels).

There is no passage ranking in the training relevance judgements. We ask you, however, to rank your retrieved passages.

Evaluations

We are aware that the paragraphs marked as relevant, as a consequent of being indicated as such in the search reports, are not always spot on. Most likely, a system will return a child element highly ranked and the rest of its siblings much lower on the list. The evaluation metric will take this into account and will not be a simple MAP in the way we are used to.

Looking closer at search reports, we can interpret the data listed in them as: "for the set of claims C in the patent application PTop, the existing patent application PRel is relevant with the following passages being very relevant: page 3, line 15 - page 4 line 26; page 6, lines 12-35."
Assuming that the respective relevant lines correspond to paragraphs p1-p7 and p10-p12 in the PRel document, then, the relevance judgements will look like:

	  top-C Q0 PRel p1
	  top-C Q0 PRel p2
	  top-C Q0 PRel p3
	  top-C Q0 PRel p4
	  top-C Q0 PRel p5
	  top-C Q0 PRel p6
	  top-C Q0 PRel p7
	  top-C Q0 PRel p10
	  top-C Q0 PRel p11
	  top-C Q0 PRel p12
	  
While we are, still discussing how exactly it is best to score results, we think that a system which returned paragraphs p2 and p10, for instance, as the top two should get an almost perfect score, indicating that the system has brought the user to the two sections which the examiner considered useful.