To understand what happened during this track please refer to the following publications:
Other documents related to this track:
The CLEF-IP 2010 corpus is an extract of the MAREC data collection.
|MAREC by IRF is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Permissions beyond the scope of this license may be available at mailto:firstname.lastname@example.org.|
The necessary type definition documents: dtds.7z (46 K)
Note: The IE 6 and 7 browsers will give you an error when trying to save/download files greater than 4Gb.
Use another browser instead, or a download manager that allows resuming the download when network errors occur
(e.g. FileZilla, wGetGUI, etc.).
In any case, do not open the file on the fly, but right click on the link and choose 'Save link as ...' .
Alternative downloads for the the 1995-1998 data
These files are made available for those who use FAT32 file systems or encounter problems downloading the CLEFIP-1995-1998.tgz. If you have already successfully downloaded the CLEFIP-1995-1998.tgz you don't need to download these files.
Here you have a field by field description and content examples of the XML files in the collection.
For each task of the Clef-IP 09 track, we provide 4 sets of different sizes of topics: XLarge, Large, Medium, Small.
Here you can download the track guidelines for these tasks.
|XLarge:||10,000 topics||topics_CLEFIP09_Main_XL.tar.gz||(~ 152M )|
|Large:||5,000 topics||topics_CLEFIP09_Main_L.tar.gz||(~ 76M )|
|Medium:||1,000 topics||topics_CLEFIP09_Main_M.tar.gz||(~ 16M )|
|Small:||500 topics||topics_CLEFIP09_Main_S.tar.gz||(~ 7.6M )|
This version of the topics can be downloaded as one big tarball file. (~ 88 Mb)
|relass_CLEFIP09_Main_ts.txt (relevance assessments)|
Guidelines for using the large training set of topics
|topics_CLEFIP09_Main_lts_NEW.tar.gz (~ 7.5 M)|
|topics_CLEFIP09_EN_lts_NEW.tar.gz ( ~ 114 K )|
|topics_CLEFIP09_DE_lts_NEW.tar.gz ( ~ 128 K )|
|topics_CLEFIP09_FR_lts_NEW.tar.gz ( ~ 128 K )|
The archives contain the following files:
Differently from the description in the 'Guidelines' above, the new topic format has a (much) shorter description field which, now, contains only the name of the xml file with the data to be searched/indexed.
The relevance assessments for the language tasks are contained in the ones for the Main tasks.
Disclaimer: this data was specifically assembled for the CLEF-IP track. Please note that in order to use this data you must have signed the CLEF campaign End-User Agreement or the CLEF-IP 09 License Agreement.