scispace - formally typeset
Search or ask a question
Author

Bhiksha Raj

Bio: Bhiksha Raj is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Speaker recognition. The author has an hindex of 51, co-authored 359 publications receiving 13064 citations. Previous affiliations of Bhiksha Raj include Disney Research & National Institute of Technology, Karnataka.


Papers
More filters
Proceedings ArticleDOI
26 Apr 2017
TL;DR: In this paper, the angular softmax (A-softmax) loss was proposed to learn angularly discriminative features for deep face recognition under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal interclass distance under a suitably chosen metric space.
Abstract: This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter m. We further derive specific m to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge 1 show the superiority of A-Softmax loss in FR tasks.

2,272 citations

Posted Content
TL;DR: This paper proposes the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features in deep face recognition (FR) problem under open-set protocol.
Abstract: This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural networks (CNNs) to learn angularly discriminative features. Geometrically, A-Softmax loss can be viewed as imposing discriminative constraints on a hypersphere manifold, which intrinsically matches the prior that faces also lie on a manifold. Moreover, the size of angular margin can be quantitatively adjusted by a parameter $m$. We further derive specific $m$ to approximate the ideal feature criterion. Extensive analysis and experiments on Labeled Face in the Wild (LFW), Youtube Faces (YTF) and MegaFace Challenge show the superiority of A-Softmax loss in FR tasks. The code has also been made publicly available.

1,215 citations

01 Mar 2004
TL;DR: Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems and to provide researchers with a "researchready" system.
Abstract: Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "researchready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.

483 citations

Proceedings ArticleDOI
07 May 1996
TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.
Abstract: In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or "stereo" recordings of clean and degraded speech. In this work we introduce the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINX-II system on the 100-word alphanumeric CENSUS database and on the 1993 5000-word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms.

480 citations

16 Nov 2017
TL;DR: This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset.
Abstract: DCASE 2017 Challenge consists of four tasks: acoustic scene classification , detection of rare sound events, sound event detection in real-life audio, and large-scale weakly supervised sound event detection for smart cars. This paper presents the setup of these tasks: task definition, dataset, experimental setup, and baseline system results on the development dataset. The baseline systems for all tasks rely on the same implementation using multilayer perceptron and log mel-energies, but differ in the structure of the output layer and the decision making process, as well as the evaluation of system output using task specific metrics.

369 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Abstract: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets, 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets, and 3) Our learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks. In addition, the features are compact: achieving 52.8% accuracy on UCF101 dataset with only 10 dimensions and also very efficient to compute due to the fast inference of ConvNets. Finally, they are conceptually very simple and easy to train and use.

7,091 citations

Proceedings Article
01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Abstract: We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state automata (using the freely available OpenFst), together with detailed documentation and a comprehensive set of scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

5,857 citations

Journal ArticleDOI
01 May 1981
TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.
Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

4,948 citations