Logo TU Vienna  Logo IFS Information & Software Engineering Group
Vienna University of Technology
Institute of Software Technology and Interactive Systems
Information & Software Engineering Group

Music Information Retrieval

Logo Music Information 
Retrieval at TU Vienna IFS
  [Topics] [Projects] [Downloads] [People] [Publications] [Press] [Events]  

Genre Classification ISMIR 2004

Rhythm Classification ISMIR 2004

Genre Classification MIREX 2005

Mood Classification


Music Classification

Descriptors computed in the Audio Feature Extraction process build the basis for a range of retrieval tasks. With approaches from the realm of artificial intellignce both supervised and unsupervised machine learning techniques enable the computer to learn about the musical content. While unsupervised learning approaches are valuable in automatic organization of music archives, supervised machine learning techniques are applied for automatic classification tasks.

From a number of examples the computer learns how to classify music pieces into a number of previously defined classes. The taxonomy can be defined according to specific task requirements. In our projects we explore classification into musical genres (classical, jazz, hip hop, electronic, ...), classification of artists (interprets) and recognition of moods.

The advantage of music classification through supervised machine learning is, that - provided that annotated
ground-truth data exists - the result of the learning process can be directly measured in terms of accuracy, precision and recall percentage values.


Genre Classification ISMIR 2004

In the ISMIR 2004 conference it was the first time that a comparison of state-of-the-art Music Information Retrieval algorithms was performed using agreed evaluation metrics. Participants were able to submit algorithms in 5 different categegories before the conference. Training data was provided in advance to the participants in order to test and adapt their algorithms.

ISMIR2004 Audio Description Contest

The Genre Classification task was to assign 729 audio files to 6 genres (classical, electronic, jazz_blues, metal_punk, rock_pop, world). The system could be trained with 729 different files beforehand.

In our submission we extracted Rhythm Patterns from the audio data and used Support Vector Machines for machine learning. We achieved 70.4 % Accuracy (correctly classified tracks) and thus the 4th rank. (Acc._norm is the Accuracy normalized by genre frequency).

In an (unannounced) robustness test (i.e. performance on a 25 second excerpt from the middle of the audio files) three algorithms failed and the Accuracy values of the remaining two were: Lidy and Rauber 63.4 %; Tzanetakis 57.5 %.

An evalution of the our algorithm was also done on artist identification: It shows the generalization from 6 classes (genres) (55.74 % normalized Accuracy) to 30 classes (artists): 28 %. Ellis and Whitman achieved 34 %.


Rhythm Classification ISMIR 2004

Another task in the ISMIR2004 Audio Description Contest was to recognize the rhythm of ballroom dance music correctly. A set of 488 training instances (30 seconds excerpts in real audio format, from ballroomdancers.com) was given to the participants. 210 test files had to be classified into the classes: Samba, Slow Waltz, Viennese Waltz, Tango, Cha Cha, Rumba, Jive, Quickstep.

Our algorithm classified 82 % of the test files correctly and won the Rhythm Classification task. No other team participated, but still we can compare the submission to other algorithms which were published at the same time and tested on the same audio data (however, using 10-fold cross-validation):

Lidy and Rauber
ISMIR2004 Audio Description Contest

82.0 %  
F. Gouyon and S. Dixon.
Dance music classification: A tempo-based approach. ISMIR 2004
67.6 %  
F. Gouyon, S. Dixon, E. Pampalk, and G. Widmer.
Evaluating rhythmic descriptors for musical genre classification. AES25, 2004
78.9 % (90.1 %*)

S. Dixon, F. Gouyon, and G. Widmer.
Towards characterisation of music via rhythmic patterns. ISMIR 2004

85.7 % (96.0 %*)

(*These values were achieved using a-priori knowledge about tempo)


Genre Classification MIREX 2005

The evaluation (or contest) of Music Information Retrieval algorithms during ISMIR 2005 was named MIREX 2005: Music Information Retrieval Evaluation eXchange. This year, participants could submit among 10 different tasks, eventually 70 algorithms have been evaluated. We participated in the audio genre classification task as main interest and audio artist identification as secondary interest. Runtime was limited to 24 hours and our submission to the artist identification task ran out of time due to a scaling problem.

We submitted 3 algorithm variations to the audio genre classification task, which about the recognition of 6 respectively 10 genres on 2 music databases. No training data was given before to the participants, only the list of genres was known. Detailed results, confusion matrices on the two databases as well as descriptions of all algorithms can be obtained from:


We achieved 75.27 % overall accuracy with the combination of SSD and RH features, using Support Vector Machines as classifier. From the diagram below we see that a rather large number of submissions was on a quite similar level.


Mood Classification

The EmoMusic project is about classifying music according to emotions (or the mood of the listener). The emotions used are fear, hostility, guilt, sadness, joviality, self-assurance, attentiveness, shyness, fatigue, serenity, and surprise. These are the emotion scales and affective states taken from the THE PANAS-X Manual for the Positive and Negative Affect Schedule - Expanded Form by Clark and Watson.

last edited 29.11.2005 by Thomas Lidy