The Europeana Sounds project has been working for the last three years to get the collections of sound archives around Europe online. All the material related to music can be found in its dedicated thematic collection: Europeana Music. So far over 250,000 pictures, texts and sound files can be found there.

For a user of Europeana Music, it is useful to be able to search for particular music genres (e.g free jazz, Irish folk, baroque) to find what they are looking for in this vast amount of material. However, this information is not always available in the data. Currently, only about a fifth of the Europeana Music Collection has been labelled with a unified genre description. And even in those cases the genre classification is often very general, because it has been applied at the collection level instead of being optimised for a specific piece.

To improve the quality of the genre information, we are organising a genre detection challenge on the 1st of October in Vienna.

We have teamed up with the organisers of a large hackathon as part of the Vienna Waves festival, a yearly returning festival focussing on cutting edge music combining club nights with lectures, keynote talks, discussions and room for experiments. Up to 100 participants are welcome in the amazing "Werkstaetten- und Kulturhaus" (WUK) to work on this challenge or one of the many others proposed there.

The Europeana API provides programmatic access to over 35,000 music recordings that are available through the Europeana Music Collection. With this challenge we are looking for methods that automatically process the Europeana Music Collection to apply suitable genre descriptions at the item level. For this day all the openly licensed sound files from the Europeana database will be made available beforehand to the participants to make sure they can get working on it right from the start of the day. 

Prize

For the best solution of the day, a prize of 500 euros (in vouchers) is available - but we don't want the work to end after this day. If the prototype developed during the hackday has potential, the Europeana Foundation will be able to work with you (and pay) to further develop it into a working product and feed the generated data into the Europeana crowdsourcing API. 

We think the next step would be to verify the results being generated by the algorithm using the crowdsourcing tool also being developed as part of the Europeana Sounds project. So instead of asking ‘what genre do you think this is?’’, we can ask the Europeana user ‘we think this is polka, do you agree?’. This makes it much easier for the user to participate in the crowdsource activities and therefore improve the data. 

Downloads

Metadata

metadata.csv (18MB)
genres.txt (0.1MB) - a list of genre labels from the Europeana Sounds genre taxonomy.

Audio Features

Raw audio features (directory structured; one file per track)

ssd.zip (42MB)
rp.zip (328MB)
tssd.zip (255MB)
rh.zip (19MB)
trh.zip (98MB)
mvd.zip (93MB)

mfcc_aggregated.zip (25MB)
chroma_aggregated.zip (18MB)
rmse_aggregated.zip (10MB)
spectral_bandwidth_aggregated.zip (11MB)
spectral_centroid_aggregated.zip (11MB)
spectral_contrast_aggregated.zip (18MB)
spectral_rolloff_aggregated.zip (11MB)
tonnetz_aggregated.zip (15MB)
zero_crossing_rate_aggregated.zip (11MB)

Cummulated audio features (one csv-file per feature for the entire collection)

ssd.csv.gz (30MB)
rp.csv.gz (256MB)
tssd.csv.gz (208MB)
rh.csv.gz (11MB)
trh.csv.gz (77MB)
mvd.csv.gz (72MB)

mfcc.csv.gz (15MB)
chroma.csv.gz (8MB)
rmse.csv.gz (1MB)
spectral_bandwidth.csv.gz (1MB)
spectral_centroid.csv.gz (1MB)
spectral_contrast.csv.gz (8MB)
spectral_rolloff.csv.gz (11MB)
tonnetz.csv.gz (6MB)
zero_crossing_rate.csv.gz (1MB)

Example Code

Jupyter / IPython Notebook (0.1MB)
Jupyter / IPython Notebook (HTML converted) (0.3MB)

by Joris Pekel, Europeana and Alexander Schindler, Austrian Institute of Technology