CLEF-IP Download Area


All CLEF-IP collections, as extracts of MAREC, are available under the Creative Commons License (see the paragraph below) and are now freely available to download.

Creative Commons License MAREC by IRF is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Permissions beyond the scope of this license may be available at

Follow the links below to go to the specific CLEF-IP collections

CLEF-IP 2009

CLEF-IP 2012

CLEF-IP 2010

CLEF-IP 2013

CLEF-IP 2011

Documents in the CLEF-IP Corpus

Format and Content

The documents in the patent collection are stored as XML files. The documents are derived from European Patent Office and have mixed content in English, German and French.
The files contain bibliographic data as well as descriptive text. The XML files are quite comprehensive, containing detailed information on inventors, assignees, priority dates etc. From the variety of information in the XML files, these are the elements you should start to look at:

  • invention-title
  • classifications-ipcr
  • abstract

Number of Documents

2009: 1,9 million patent documents, corresponding to approximately 1 million individual patents filed between 1985 and 2000.
2010: 2,6 million patent documents, corresponding to approximately 1,3 million individual patents published until 2001.
2011: All EPO documents that have an application date previous to 2002 (more than 2.5 Million patent documents constituting more than 1 Million patents). In addition for EuroPCT Applications we also added the corresponding patent documents published by the WIPO (more than 400,000 documents).
2012, 2013: The data corpus used in this year is the same as the one used in 2011.