CLEF-IP, University of Technology Vienna

[an error occurred while processing this directive]

Claims to passage task 2013

Topics in this task are sets of claims extracted from actual patent application documents. Participants are asked to return passages that are relevant to the topic claims. The passages must occur in the documents in the CLEF-IP collection. No other data is allowed to be used in preparing for this task.

These sets of claims were chosen based on existing search reports for the considered pantent applications.

The topics (defined below) contain also a pointer to the original patent application file. The content of the xml file (other than the claims selected as topics) can be used as you like.

You can read further clarifications on this task here.

Input

A topic in the 'Claims to Passage' task contains the following sgml codes:

<tid>topic_id</tid>
<file>topic_file.xml</tfile>
<fam-docs>topic_file.xml</tfam-docs>
<claims>xpaths_to_claims</tclaims>

where

'tid' contains the id of the topic defined
'tfile' contains the name of the xml file from which the topic claims are extracted
'tfam-docs' contains the name of the xml files wich belong to the patent family of the patent in 'tfile'. Attention: only those published previous to the 'tfile' document are included in this field.
'tclaims' contains the xpaths in the xml file to the claims selected as topics. The xpaths are separated by space.

Example (taken from the set of training topics):

<tid>tPSG-5</tid>
<tfile>EP-1480263-A1.xml</tfile>
<tfam-docs>JP-2003224099-A.xml WO-2003065434-A1.xml</tfam-docs>
<tclaims>/patent-document/claims/claim[1] /patent-document/claims/claim[2] 
/patent-document/claims/claim[3] /patent-document/claims/claim[16] 
/patent-document/claims/claim[17] /patent-document/claims/claim[18] </tclaims>

Output

The retrieval results should be returned in a text file with 6 columns, as described below (based on the trec formats):

   
topic_id Q0 doc_id rel_psg_xpath psg_rank psg_score

where:

topic_id is the identifier of a topic
Q0 is a value maintained for historical reasons
doc_id is the identifier of the patent document (i.e. file name WITHOUT extension) in which the relevant passages occur
rel_psg_xpath is the xpath identifying the relevant passage in the doc_id document
psg_rank the rank of the passage in the overall list of relevant passages
psg_score is the score of the passage in the (complete) list of relevant passages

We allow only one xpath per line in the result files. If more passages are considered relevant for a topic, these have to be placed on separate lines.
The maximum number of lines in the result files is limited to containing 100 doc_ids when ignoring the xpaths.

Example (taken from the qrels, therefore, both psg_rank and psg_score values are fictional)

   
...
tPSG-5 Q0 WO-2002015251-A1 /patent-document/claims/claim 5 1.34
tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[22] 6 1.11
tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[23] 7 0.87
tPSG-5 Q0 WO-2002015251-A1 /patent-document/description/p[34] 8 0.80
...

Run Submission

Each participant is allowed to submit up to 8 run files. Each run should be submitted compressed. As in the previous years, the run files should be named using the following schema: participantID-runID-taskID.extension.
   participantID will identify your institution/group
   runID identifies the different runs you submit
   taskID should be PSG
   extension is either tgz, gz, zip or other extension used by compressing programs.

As seen above, the topics contain also a pointer to the original patent application file. Participants to the task are allowed to use the content of this file as they need and see fit, as well as the content of the files in the 'tfam-doc' field.

Evaluation Measures

There are two types of measurements we can compute on the submitted runs: at the document level and at the passage level.

At the Document Level

The main measure we will report will be PRES at 20 and 100 cut-offs. PRES rewards systems that return relevant documents earlier in the retrieval list.
In order to apply PRES to the submitted experiments, the experiments will be stripped off of the passage information, the ranking will be kept. For instance, the example run

   
...
tPSG-16 Q0 WO-2000078185-A2 /patent-document/abstract[1]/p 1 2.53
tPSG-16 Q0 WO-2000078185-A2 /patent-document/abstract[2]/p 2 2.2
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[41] 3 1.89
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[42] 4 1.75
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[43] 5 1.5
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[44] 6 1.02
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[45] 7 0.9
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[46]  8 0.8
tPSG-16 Q0 WO-2000078185-A2 /patent-document/description/p[47]  9 0.7
tPSG-16 Q0 WO-1997007715-A1 /patent-document/abstract[1]/p 10 0.66
tPSG-16 Q0 WO-1997007715-A1 /patent-document/abstract[2]/p 11 0.60
tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[43] 12 0.5
tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[44] 13 0.42
tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[45] 14 0.42
tPSG-16 Q0 WO-1997007715-A1 /patent-document/description/p[46] 15 0.42
...

will be processed into the following:

   
...
tPSG-16 Q0 WO-2000078185-A2 1 2.53
tPSG-16 Q0 WO-1997007715-A1 2 0.66
...

and given as input to the script computing the PRES score. (Note that the psg_score column - the last one - is ignored in the PRES computation.)

At the Passage Level

The main measure reported here will be the MAgP measure, which is an adaptation of the measure used in the INEX Ad-hoc track, Relevant in Context Task (see this article).

Evaluation Results

The evaluations were made public to the participants. We will soon post the results here as well.

Training data

As training data we have prepared the training topics and the test topics used in 2012 to be used as training data in 2013. To the topics used in 2012 we have added the 'tfam-doc' field with pointers to the patent documents that are part of the patent family of the topic document. Where available, we also made these files available.

Download here the training data.

Test data

The set of test topics contains 145 topics, 50 in English, 50 in German and 49 in French. You can download it here.
The set of relevance assessments can be downloaded here

When looking at the topics in the training set, you surely have noticed that all XPaths are relative to A level documents (i.e. application document). The same is true for the topics in the test set. This is due to the fact that search reports (almost) allways refer to application documents as relevant citations.

Contact

For questions, suggestions and anything else regarding this task, please contact Mihai Lupu (lupu at ifs.tuwien.ac.at) or Florina Piroi (piroi at ifs.tuwien.ac.at)