CLEF-IP 2012 Download Area

Jump to: Patent Data | Topics and Qrels


The CLEF-IP 2012 corpus is an extract of the MAREC data collection. It is the same as the CLEF-IP 2011 corpus.


Creative Commons License MAREC by IRF is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Permissions beyond the scope of this license may be available at

CLEF-IP 2012 Corpus

The necessary type definition documents: dtds.7z (46 K)

Note: If you were a CLEF-IP 2011 participant and have downloaded the 2011 data you don't need to download the files below. CLEF-IP 2011 and 2012 use the same corpus of patent documents.
clef-ip-2012-ep0.7z.001 1G
clef-ip-2012-ep0.7z.002 1G
clef-ip-2012-ep0.7z.003 1G
clef-ip-2012-ep0.7z.004 1G
clef-ip-2012-ep0.7z.005 1G
clef-ip-2012-ep0.7z.006 1G
clef-ip-2012-ep0.7z.007 1G
clef-ip-2012-ep0.7z.008 85Mb
clef-ip-2012-ep1.7z.001 1G
clef-ip-2012-ep1.7z.002 1G
clef-ip-2012-ep1.7z.003 429Mb
PART 3 - WO patents
clef-ip-2012-wo.7z.001 1G
clef-ip-2012-wo.7z.002 1G
clef-ip-2012-wo.7z.003 1G
clef-ip-2012-wo.7z.004 540Mb

Unzipping the data

You can download 7-Zip for your system from the project's website.

Windows users can unzip using the context menu (right click on the first archive file).

Linux users can use p7zip or 7za:

    7za x file.7z.001

The unzipping will continue automatically with the rest of the files of the archive.

Hash files for the two archives

 back to top

Topics and Qrels

Claims to Passages Task

Training set 18 DE, 21 EN, 12 FR qrels in the archive
Test set 35 in each EN, DE, FR qrels
Originally submitted runs ~ 30 Mb evaluation results.
To see more details about the scripts used for evaluating the submissions and the measures we've computed, see this page .

Flow-chart Recognition Task

Training set 50 images qrels
Test set 100 images qrels

Chemical Structure Recognition Task

Segmentation training set (~80 MB) 30 images qrels in the archive
Segmentation test set (~141 MB) 30 files qrels
Structure recognition training set (~2 MB) 133 images qrels in the archive
Structure recognition test set 961 images qrels

 back to top

Disclaimer: This data was specifically assembled for the CLEF-IP track.