Home
/
Authors
/
Beth Logan

Author

Beth Logan

Other affiliations: Intel

Bio: Beth Logan is an academic researcher from Hewlett-Packard. The author has contributed to research in topics: Search engine indexing & Feature vector. The author has an hindex of 26, co-authored 50 publications receiving 4412 citations. Previous affiliations of Beth Logan include Intel.

Papers

PDF

Open Access

More filters

Proceedings Article•

Mel frequency cepstral coefficients for music modeling

[...]

Beth Logan

01 Jan 2000

TL;DR: The results show that the use of the Mel scale for modeling music is at least not harmful for this problem, although further experimentation is needed to verify that this is the optimal scale in the general case and whether this transform is valid for music spectra.

...read moreread less

Abstract: We examine in some detail Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and investigate their applicability to modeling music. In particular, we examine two of the main assumptions of the process of forming MFCCs: the use of the Mel frequency scale to model the spectra; and the use of the Discrete Cosine Transform (DCT) to decorrelate the Mel-spectral vectors. We examine the first assumption in the context of speech/music discrimination. Our results show that the use of the Mel scale for modeling music is at least not harmful for this problem, although further experimentation is needed to verify that this is the optimal scale in the general case. We investigate the second assumption by examining the basis vectors of the theoretically optimal transform to decorrelate music and speech spectral vectors. Our results demonstrate that the use of the DCT to decorrelate vectors is appropriate for both speech and music spectra. MFCCs for Music Analysis Of all the human generated sounds which influence our lives, speech and music are arguably the most prolific. Speech has received much focused attention and decades of research in this community have led to usable systems and convergence of the features used for speech analysis. In the music community however, although the field of synthesis is very mature, a dominant paradigm has yet to emerge to solve other problems such as music classification or transcription. Consequently, many representations for music have been proposed (e.g. (Martin1998), (Scheirer1997), (Blum1999)). In this paper, we examine some of the assumptions of Mel Frequency Cepstral Coefficients (MFCCs) the dominant features used for speech recognition and examine whether these assumptions are valid for modeling music. MFCCs have been used by other authors to model music and audio sounds (e.g. (Blum1999)). These works however use cepstral features merely because they have been so successful for speech recognition without examining the assumptions made in great detail. MFCCs (e.g. see (Rabiner1993)) are short-term spectral features. They are calculated as follows (the steps and assumptions made are explained in more detail in the full paper): 1. Divide signal into frames. 2. For each frame, obtain the amplitude spectrum. 3. Take the logarithm. 4. Convert to Mel (a perceptually-based) spectrum. 5. Take the discrete cosine transform (DCT). We seek to determine whether this process is suitable for creating features to model music. We examine only steps 4 and 5 since, as explained in the full paper, the other steps are less controversial. Step 4 calculates the log amplitude spectrum on the so-called Mel scale. This transformation emphasizes lower frequencies which are perceptually more meaningful for speech. It is possible however that the Mel scale may not be optimal for music as there may be more information in say higher frequencies. Step 5 takes the DCT of the Mel spectra. For speech, this approximates principal components analysis (PCA) which decorrelates the components of the feature vectors. We investigate whether this transform is valid for music spectra. Mel vs Linear Spectral Modeling To investigate the effect of using the Mel scale, we examine the performance of a simple speech/music discriminator. We use around 3 hours of labeled data from a broadcast news show, divided into 2 hours of training data and 40 minutes of testing data. We convert the data to ‘Mel’ and ‘Linear’ cepstral features and train mixture of Gaussian classifiers for each class. We then classify each segment in the test data using these models. This process is described in more detail in the full paper. We find that for this speech/music classification problem, the results are (statistically) significantly better if Mel-based cepstral features rather than linear-based cepstral features are used. However, whether this is simply because the Mel scale models speech better or because it also models music better is not clear. At worst, we can conclude that using the Mel cepstrum to model music in this speech/music discrimination problem is not harmful. Further tests are needed to verify that the Mel cepstrum is appropriate for modeling music in the general case. Using the DCT to Approximate Principal Components Analysis We additionally investigate the effectiveness of using the DCT to decorrelate Mel spectral features. The mathematically correct way to decorrelate components is to use PCA (or equivalently the KL transform). This transform uses the eigenvalues of the covariance matrix of the data to be modeled as basis vectors. By investigating how closely these vectors approximate cosine functions we can get a feel for how well the DCT approximates PCA. By inspecting the eigenvectors for the Mel log spectra for around 3 hours of speech and 4 hours of music we see that the DCT is an appropriate transform for decorrelating music (and speech) log spectra. Future Work Future work should focus on a more thorough examination the parameters used to generate MFCC features such as the sampling rate of the signal, the frequency scaling (Mel or otherwise) and the number of bins to use when smoothing. Also worthy of investigation is the windowing size and frame rate. Suggested Readings Blum, T, Keislar, D., Wheaton, J. and Wold, E., 1999, Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information, U.S. Patent 5, 918, 223. Martin, K.. 1998, Toward automatic sound source recognition: identifying musical instruments, Proceedings NATO Computational Hearing Advanced Study Institute. Rabiner, L. and Juang, B., 1993, Fundamentals of Speech Recognition, Prentice-Hall. Scheirer, E. and Slaney, M., 1997, Construction and evaluation of a robust multifeature speech/music discriminator, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing.

...read moreread less

1,189 citations

Proceedings Article•DOI•

[...]

Beth Logan, A. Salomon¹•Institutions (1)

Hewlett-Packard¹

31 Oct 2001

TL;DR: The present invention computer method and apparatus determines music similarity by generating a K-means cluster signature and a beat signature and the beat of the music is included in the subsequent distance measurement.

...read moreread less

Abstract: The present invention computer method and apparatus determines music similarity by generating a K-means (instead of Gaussian) cluster signature and a beat signature for each piece of music. The beat of the music is included in the subsequent distance measurement.

...read moreread less

475 citations

Proceedings Article•DOI•

A long-term evaluation of sensing modalities for activity recognition

[...]

Beth Logan¹, Jennifer Healey¹, Matthai Philipose¹, Emmanuel Munguia Tapia², Stephen S. Intille² - Show less +1 more•Institutions (2)

Intel¹, Massachusetts Institute of Technology²

16 Sep 2007

TL;DR: A number of issues important for designing activity detection systems that may not have been as evident in prior work when data was collected under more controlled conditions are characterized.

...read moreread less

Abstract: We study activity recognition using 104 hours of annotated data collected from a person living in an instrumented home. The home contained over 900 sensor inputs, including wired reed switches, current and water flow inputs, object and person motion detectors, and RFID tags. Our aim was to compare different sensor modalities on data that approached "real world" conditions, where the subject and annotator were unaffiliated with the authors. We found that 10 infra-red motion detectors outperformed the other sensors on many of the activities studied, especially those that were typically performed in the same location. However, several activities, in particular "eating" and "reading" were difficult to detect, and we lacked data to study many fine-grained activities. We characterize a number of issues important for designing activity detection systems that may not have been as evident in prior work when data was collected under more controlled conditions.

...read moreread less

399 citations

Proceedings Article•

A Large-Scale Evaluation of Acoustic and Subjective Music Similarity Measures

[...]

Adam Berenzweig¹, Beth Logan², Daniel P. W. Ellis¹, Brian Whitman², Brian Whitman³ - Show less +1 more•Institutions (3)

Columbia University¹, Hewlett-Packard², Massachusetts Institute of Technology³

01 Jan 2003

TL;DR: In this article, both acoustic and subjective approaches for calculating similarity between artists, comparing their performance on a common database of 400 popular artists, were evaluated. And they evaluated acoustic techniques based on Mel-frequency cepstral coefficients and an intermediate "anchor space" of genre classification, and subjective techniques which use data from The All Music Guide, from a survey, from playlists and personal collections, and from web-text mining.

...read moreread less

Abstract: music similarity, acoustic measures, evaluation, ground-truth Subjective similarity between musical pieces and artists is an elusive concept, but one that must be pursued in support of applications to provide automatic organization of large music collections. In this paper, we examine both acoustic and subjective approaches for calculating similarity between artists, comparing their performance on a common database of 400 popular artists. Specifically, we evaluate acoustic techniques based on Mel-frequency cepstral coefficients and an intermediate ‘anchor space’ of genre classification, and subjective techniques which use data from The All Music Guide, from a survey, from playlists and personal collections, and from web-text mining.

...read moreread less

332 citations

Journal Article•DOI•

A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures

[...]

Adam Berenzweig¹, Beth Logan², Daniel P. W. Ellis¹, Brian Whitman³, Brian Whitman² - Show less +1 more•Institutions (3)

Columbia University¹, Hewlett-Packard², Massachusetts Institute of Technology³

01 Jun 2004-Computer Music Journal

TL;DR: This paper evaluates acoustic techniques based on Mel-frequency cepstral coefficients and an intermediate ‘anchor space’ of genre classification, and subjective techniques which use data from The All Music Guide, from a survey, from playlists and personal collections, and from web-text mining.

...read moreread less

267 citations

1
2
3
4
…
5
6
7
8
9
10

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Musical genre classification of audio signals

[...]

George Tzanetakis¹, Perry R. Cook¹•Institutions (1)

Princeton University¹

07 Nov 2002-IEEE Transactions on Speech and Audio Processing

TL;DR: The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.

...read moreread less

Abstract: Musical genres are categorical labels created by humans to characterize pieces of music. A musical genre is characterized by the common characteristics shared by its members. These characteristics typically are related to the instrumentation, rhythmic structure, and harmonic content of the music. Genre hierarchies are commonly used to structure the large collections of music available on the Web. Currently musical genre annotation is performed manually. Automatic musical genre classification can assist or replace the human user in this process and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. In this paper, the automatic classification of audio signals into an hierarchy of musical genres is explored. More specifically, three feature sets for representing timbral texture, rhythmic content and pitch content are proposed. The performance and relative importance of the proposed features is investigated by training statistical pattern recognition classifiers using real-world audio collections. Both whole file and real-time frame-based classification schemes are described. Using the proposed feature sets, classification of 61% for ten musical genres is achieved. This result is comparable to results reported for human musical genre classification.

...read moreread less

2,668 citations

Book Chapter•DOI•

The Boosting Approach to Machine Learning An Overview

[...]

Robert E. Schapire¹•Institutions (1)

AT&T Labs¹

01 Jan 2003

TL;DR: This chapter overviews some of the recent work on boosting including analyses of AdaBoost's training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of Ada boost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

...read moreread less

Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of AdaBoost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

...read moreread less

1,979 citations

Journal Article•DOI•

Content-based multimedia information retrieval: State of the art and challenges

[...]

Michael S. Lew¹, Nicu Sebe², Chabane Djeraba, Ramesh Jain³•Institutions (3)

Leiden University¹, University of Amsterdam², University of California, Irvine³

01 Feb 2006-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.

...read moreread less

Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

...read moreread less

1,652 citations

Journal Article•DOI•

Digital processing of speech signals

[...]

M.G. Bellanger

01 Oct 1980

1,565 citations

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse