CLEF-IP 2011 was organized by and Information Retrieval Facility Vienna (IRF), the Vienna University of Technology and by max-recall GmbH Vienna. These pages are rarely updated.

CLEF-IP 2011 Download Area

Jump to: Documents | Data | Topics and Qrels | Submitted Runs


The overview notes describing the CLEF-IP 2011 Lab are the following:

  • Florina Piroi, Mihai Lupu, Allan Hanbury, and Veronika Zenz. CLEF-IP 2011: Retrieval in the Intellectual Property Domain (download)

Here are some documents that describe the tasks, the topic content, the format of the submissions:

Guidelines for the PAC and CLS tasks CLEF-IP2011-PAC_CLS_guidelines.pdf
Submission format and guidelines for the PAC and CLS tasks CLEF-IP2011-PAC_CLS_submission_guidelines.pdf
Guidelines for the IMG tasks CLEF-IP2011-IMG_tasks_guidelines.pdf
Submission format and guidelines for the IMG-PAC and IMG-CLS tasks CLEF-IP2011-IMG_tasks_submission_format.pdf

 back to top



The CLEF-IP 2011 corpus is an extract of the MAREC data collection

Creative Commons License MAREC by IRF is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Permissions beyond the scope of this license may be available at

CLEF-IP 2011 Corpus

The necessary type definition documents: dtds.7z (46 K)

Note: If you are a registered PatOlympics 2011 participant, and already have downloaded the data, you don't need to download the files below.
clef-ip-2011-ep0.7z.001 1G
clef-ip-2011-ep0.7z.002 1G
clef-ip-2011-ep0.7z.003 1G
clef-ip-2011-ep0.7z.004 1G
clef-ip-2011-ep0.7z.005 1G
clef-ip-2011-ep0.7z.006 1G
clef-ip-2011-ep0.7z.007 1G
clef-ip-2011-ep0.7z.008 85Mb
clef-ip-2011-ep1.7z.001 1G
clef-ip-2011-ep1.7z.002 1G
clef-ip-2011-ep1.7z.003 429Mb
PART 3 - WO patents
clef-ip-2011-wo.7z.001 1G
clef-ip-2011-wo.7z.002 1G
clef-ip-2011-wo.7z.003 1G
clef-ip-2011-wo.7z.004 540Mb

Unzipping the data

You can download 7-Zip for your system from the project's website.

Windows users can unzip using the context menu (right click on the first archive file).

Linux users can use p7zip or 7za:

    7za x file.7z.001

The unzipping will continue automatically with the rest of the files of the archive.

Hash files for the two archives

 back to top

Data for the Image PAC Tasks

The necessary type definition documents: dtds.7z (46 K)

EP.A43B.before2002.tar.bz2 99 Mb
EP.A61B.before2002.tar.bz2 1.5 Gb
EP.H01L.before2002.tar.bz2 2.9 Gb
WO.A43B.before2002.tar.bz2 2.4 Mb
WO.A61B.before2002.tar.bz2 71 Mb
WO.H01L.before2002.tar.bz2 101 Mb
XML Files
EP.A43B.xml.tar.bz2 11 Mb
EP.A61B.xml.tar.bz2 85 Mb
EP.H01L.xml.tar.bz2 167 Mb
WO.A43B.xml.tar.bz2 1.4 Mb
WO.A61B.xml.tar.bz2 41 Mb
WO.H01L.xml.tar.bz2 37 Mb

 back to top

Topics and Qrels

Prior Art Candidates Task topics: clef-ip-2011_PACTopics.7z 3973 topics 30 Mb qrels
Classification Tasks topics (both tasks): clef-ip-2011_CLSTopics.7z 3000 CLS1 topics and 4934 CLS2 topics 41 Mb qrels
Image Prior Art Candidates Task topics:clef-ip-2011_IMG_PACTopics.7z 211 topics 171 Mb qrels
Image Classification Task topics 100 topics 14 Mb qrels

Training Topics

Prior Art Candidates Task training topics: clef-ip-2011_PACTraining.7z 300 topics
Image Classification Task training topics: clef-ip-2011-IMG-CLS-training.tar.gz 338 Mb

Image Classification Training sets by class

Class Class number Abbreviation Number of training images Files to download Size
drawing 1 ad 5566 ad.tar.bz2 50 Mb
chemical structures 2 cf 5958 cf.tar.bz2 10 Mb
program listing 3 cp 5574 cp.tar.bz2 65 Mb
gene sequence (dna) 4 dn 5983 dn.tar.bz2 82 Mb
flow chart 5 ff 311 ff.tar.bz2 3 Mb
graph 6 gr 1664 gr.tar.bz2 20 Mb
math 7 mf 5950 mf.tar.bz2 6 Mb
table 8 tb 5502 tb.tar.bz2 100 Mb
character (symbol) 9 tx 1579 tx.tar.bz2 150 Kb

As an additional training set for the PAC task, you can use the last year's data. Contact us for more information.

 back to top

Disclaimer: This data was specifically assembled for the CLEF-IP track. Please note that in order to use this data you must have signed the CLEF-IP 2010 License Agreement. Contact us for more information.