scispace - formally typeset
Search or ask a question
Author

Richard J. Mammone

Other affiliations: Iowa State University
Bio: Richard J. Mammone is an academic researcher from Rutgers University. The author has contributed to research in topics: Speaker recognition & Artificial neural network. The author has an hindex of 35, co-authored 163 publications receiving 4127 citations. Previous affiliations of Richard J. Mammone include Iowa State University.


Papers
More filters
Journal ArticleDOI
TL;DR: Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described, including the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.
Abstract: The future commercialization of speaker- and speech-recognition technology is impeded by the large degradation in system performance due to environmental differences between training and testing conditions. This is known as the "mismatched condition." Studies have shown [l] that most contemporary systems achieve good recognition performance if the conditions during training are similar to those during operation (matched conditions). Frequently, mismatched conditions axe present in which the performance is dramatically degraded as compared to the ideal matched conditions. A common example of this mismatch is when training is done on clean speech and testing is performed on noise- or channel-corrupted speech. Robust speech techniques [2] attempt to maintain the performance of a speech processing system under such diverse conditions of operation. This article presents an overview of current speaker-recognition systems and the problems encountered in operation, and it focuses on the front-end feature extraction process of robust speech techniques as a method of improvement. Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described. Also described is the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.

344 citations

Journal ArticleDOI
TL;DR: The modified neural tree network (MNTN) is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks that is found to perform better than full-search VQ classifiers for both of these applications.
Abstract: An evaluation of various classifiers for text-independent speaker recognition is presented. In addition, a new classifier is examined for this application. The new classifier is called the modified neural tree network (MNTN). The MNTN is a hierarchical classifier that combines the properties of decision trees and feedforward neural networks. The MNTN differs from the standard NTN in both the new learning rule used and the pruning criteria. The MNTN is evaluated for several speaker recognition experiments. These include closed- and open-set speaker identification and speaker verification. The database used is a subset of the TIMIT database consisting of 38 speakers from the same dialect region. The MNTN is compared with nearest neighbor classifiers, full-search, and tree-structured vector quantization (VQ) classifiers, multilayer perceptrons (MLPs), and decision trees. For closed-set speaker identification experiments, the full-search VQ classifier and MNTN demonstrate comparable performance. Both methods perform significantly better than the other classifiers for this task. The MNTN and full-search VQ classifiers are also compared for several speaker verification and open-set speaker-identification experiments. The MNTN is found to perform better than full-search VQ classifiers for both of these applications. In addition to matching or exceeding the performance of the VQ classifier for these applications, the MNTN also provides a logarithmic saving for retrieval. >

295 citations

Proceedings Article
01 Jan 2000
TL;DR: The authors des ribe the IBM Statisti al Question Answering for TREC-9 system in detail and look at several examples and errors and results at the 250 byte and 50 byte levels for the overall system as well as results on ea h sub omponent.
Abstract: Abraham Itty heriah, Martin Franz, Wei-Jing Zhu, Adwait Ratnaparkhi P.O.Box 218, Yorktown Heights, NY 10598 fabei,franzm,wjzhu,adwaitrg watson.ibm. om Ri hard J. Mammone Dept. of Ele tri al Engineering, Rutgers University, Pis ataway, NJ 08854 mammone aip.rutgers.edu Abstra t We des ribe the IBM Statisti al Question Answering for TREC-9 system in detail and look at several examples and errors. The system is an appli ation of maximum entropy lassi ation for question/answer type predi tion and named entity marking. We des ribe our system for information retrieval whi h in the rst step did do ument retrieval from a lo al en y lopedia, and in the se ond step performed an expansion of the query words and nally did passage retrieval from the TREC olle tion. We will also dis uss the answer sele tion algorithm whi h determines the best senten e given both the question and the o urren e of a phrase belonging to the answer lass desired by the question. Results at the 250 byte and 50 byte levels for the overall system as well as results on ea h sub omponent are presented. 1 System Des ription Systems that perform question answering automati ally by omputer have been around for some time as des ribed by (Green et al., 1963). Only re ently though have systems been developed to handle huge databases and a slightly ri her set of questions. The types of questions that an be dealt with today are restri ted to be short answer fa t based questions. In TREC-8, a number of sites parti ipated in the rst question-answering evaluation (Voorhees and Ti e, 1999) and the best systems identi ed four major subomponents: Question/Answer Type Classi ation Query expansion/Information Retrieval Named Entity Marking Answer Sele tion Our system ar hite ture for this year was built around these four major omponents as shown in Fig. 1. Here, the question is input and lassi ed as asking for an answer whose ategory is one of the named entity lasses to be des ribed below. Additionally, the question is presented to the information retrieval (IR) engine for query expansion and do ument retrieval. This engine, given the query, looks at the database of do uments and outputs the best do uments or passages annotated with the named entities. The nal stage is to sele t the exa t answer, given the information about the answer lass and the top s oring passages. Minimizing various distan e metri s applied over phrases or windows of text results in the best s oring se tion that has a phrase belonging to answer lass. This then represents the best s oring answer.

224 citations

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A novel method for training neural networks using an additional observing neural network called a meta-neural network (MNN) to direct the training of the basic neural network and the MNN is shown to help solve the problem of sensitivity to initial weight vectors.
Abstract: A novel method for training neural networks is introduced. The method uses an additional observing neural network called a meta-neural network (MNN) to direct the training of the basic neural network. The MNN provides the basic neural network with a step size and a direction vector which is optimal based on successful training strategies learned from problems solved previously. The combination of the MNN with the basic neural network is shown to improve learning rates for several problems when the MNN is trained on a similar problem. The MNN is shown to help solve the problem of sensitivity to initial weight vectors. In addition, computer simulations demonstrate the improvement in the learning rate of the enhanced neural network on a 4-b parity problem, when it has been trained on a different nonlinear Boolean function. >

212 citations

Patent
29 Jan 1998
TL;DR: In this article, a user speaks into a microphone (600) and the input speech is analyzed in an automatic speaker recognition system to extract parameters (25A-25D). Comparisons of multiple input patterns and recorded reference patterns are conducted (610, 620) to detect whether the input was from a recorded source (150) or not from an audio source (20).
Abstract: A user speaks into a microphone (600) and the input speech is analyzed in an automatic speaker recognition system to extract parameters (25A-25D). Comparisons of multiple input patterns and recorded reference patterns are conducted (610, 620) to detect wheter the input was from a recorded source (150) or not from a recorded source (20).

185 citations


Cited by
More filters
Proceedings Article
06 Aug 2017
TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Abstract: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

7,027 citations

Journal ArticleDOI
TL;DR: The multi-objective optimal design of a liquid rocket injector is presented to highlight the state of the art and to help guide future efforts.

2,152 citations

Journal ArticleDOI
TL;DR: The bias-variance decomposition of the error is provided in this paper, which shows that the success of GASEN may lie in that it can significantly reduce the bias as well as the variance.

1,898 citations

Journal ArticleDOI
R. Reed1
TL;DR: The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed.
Abstract: A rule of thumb for obtaining good generalization in systems trained by examples is that one should use the smallest system that will fit the data. Unfortunately, it usually is not obvious what size is best; a system that is too small will not be able to learn the data while one that is just big enough may learn very slowly and be very sensitive to initial conditions and learning parameters. This paper is a survey of neural network pruning algorithms. The approach taken by the methods described here is to train a network that is larger than necessary and then remove the parts that are not needed. >

1,705 citations

Journal ArticleDOI
01 Sep 1997
TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.
Abstract: A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person's claimed identity. Speech processing and the basic components of automatic speaker-recognition systems are shown and design tradeoffs are discussed. Then, a new automatic speaker-recognition system is given. This recognizer performs with 98.9% correct decalcification. Last, the performances of various systems are compared.

1,686 citations