Fig. 3. We represent three music tunes (genre labels: heavy metal, jazz, classical) by their spectral content in overlapping small time frames (w = 30msec, with an overlap of 10msec, see [18], for details). To make the visualization relatively independent of ‘pitch’, we use the so-called mel-cepstral representation (MFCC, K = 13 coefficients pr. frame). To reduce noise in the visualization we have ‘sparsified’ the amplitudes. This was achieved simply by keeping coefficients that belonged to the upper 5% magnitude percentile. The total number of frames in the analysis was F = 105. Latent semantic analysis provided unsupervised subspaces with maximal variance for a given dimension. We show the scatter plots of the data of the first 1-5 latent dimensions. The scatter plots below the diagonal have been ‘zoomed’ to reveal more details of the ICA ‘ray’ structure. For interpretation we have coded the data points with signatures of the three genres involved: classical (∗), heavy metal (diamond), jazz (+). The ICA ray structure is striking, however, note that the situation is not one-to-one (ray to genre) as in the small text databases. A component (ray) quantifies a characteristic musical ‘theme’ at the temporal level of a frame (30msec), i.e., an entity similar to the ‘phoneme’ in speech.
...read more