Next: Clustering of Music Up: Self-Organizing Maps for Content-Based Previous: Self-Organizing Maps for Content-Based

Introduction

The wider availability of cheaper high-tech music recording equipment resulted in a tremendous rise of music data available electronically. Apart from the much-criticized pirated copies of copyrighted labels, many independent composers and smaller bands make their recordings publicly available for little or no fees at all via public domain music libraries such as AudioGalaxy.com. Contrary to well-known composers and bands, where users commonly know the style and characteristics of their favorite stars, finding pieces of music to suit ones taste is rather difficult in this public domain setting. In order to help users in finding their way through the piles of publicly available pieces of music from lesser-known groups, music portals try to provide a manual classification of the titles they offer. This way of organizing and presenting music closely mirrors the way music is presented in conventional stores, where we also frequently find CDs to be organized first by musical genres, within which an alphabetical organization is followed. Yet, providing such a manual classification becomes increasingly difficult with the amount of music submitted every day increasing. Furthermore, the resulting classification into any musical genre hierarchy is highly subjective. These effects are even worse when the classification is performed by several persons, such as by the performing artists themselves.

In order to cope with this challenge, methods for automatically organizing music by genre gain importance. Due to the difficulties of analyzing the content of music itself, most approaches reverted to text-based analysis of pieces of music, relying on title and author information, metadata description, or the lyrics of songs for automatic classification. These features form the core of the search facilities of the MPEG7 standard currently under development [7]. Similar to manual classification, these approaches to finding and organizing music rely heavily on manually created descriptions. A different line of research is constituted by content-based music analysis, trying to organize and locate pieces of music based on the similarity of melodies. The digital music library [4,1] extracts melody-information from a hummed query and matches it against a database of musical tunes for which the actual scores are available. Similar approaches are reported in [6], using the scores provided by MIDI-files to index and retrieve musical documents, and in [3], focusing on beat detection.

Yet, for the majority of music documents available today, such as the prominent MP3 files, no musical scores are provided. What we would thus like to have is a way to provide content-based organization and retrieval of musical documents based on the actual sound rather than on score transcripts. However, with the huge amounts of data used for describing sound information as well as the inherent noise in musical sound representation, conventional retrieval techniques are of only limited use. This makes it a challenging arena for neural networks, which are particularly suited for generalizing from noisy data and for extracting key features from large datasets.

In this paper we propose a content-based clustering of musical documents based on the actual sound. Rather than trying to extract precise scores, frequency spectra are used to describe the characteristics of a specific piece of music. We then use the Self-Organizing Map (SOM) [5], a popular unsupervised neural network, to automatically cluster pieces of music according to their similarity. After the unsupervised training process, similar pieces of music are found in neighboring areas on the two-dimensional map display. This allows a user to easily orient herself within an unknown music collection, by finding, say, classical music in the upper left corner of the map, whereas disco-style music may be found in a different region. Selecting a cluster of music according to ones current preferences, rather than having to specify a list of songs based on textual descriptions provides a more intuitive and direct access to music libraries. These concepts have successfully been applied to text clustering [2,8].

The remainder of this paper is structured as follows: Section 2 presents the architecture of our system, detailing feature extraction, vector creation and music clustering using the Self-Organizing Map. We then provide experimental results using a collection of MP3 files in Section 3 and finally some conclusions as well as an outlook on future work in Section 4.

Next: Clustering of Music Up: Self-Organizing Maps for Content-Based Previous: Self-Organizing Maps for Content-Based

Markus Fruehwirth
2001-05-15