Home
/
Authors
/
Julien Epps

Author

Julien Epps

Other affiliations: Motorola, NICTA, University of Sydney ...read more

Bio: Julien Epps is an academic researcher from University of New South Wales. The author has contributed to research in topics: Speaker recognition & Speech processing. The author has an hindex of 39, co-authored 257 publications receiving 8270 citations. Previous affiliations of Julien Epps include Motorola & NICTA.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2001
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

[...]

Nguyen Xuan Vinh¹, Julien Epps¹, James Bailey²•Institutions (2)

University of New South Wales¹, University of Melbourne²

01 Mar 2010-Journal of Machine Learning Research

TL;DR: An organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones, and advocates the normalized information distance (NID) as a general measure of choice.

...read moreread less

Abstract: Information theoretic measures form a fundamental class of measures for comparing clusterings, and have recently received increasing interest. Nevertheless, a number of questions concerning their properties and inter-relationships remain unresolved. In this paper, we perform an organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones. We discuss and prove their important properties, such as the metric property and the normalization property. We then highlight to the clustering community the importance of correcting information theoretic measures for chance, especially when the data size is small compared to the number of clusters present therein. Of the available information theoretic based measures, we advocate the normalized information distance (NID) as a general measure of choice, for it possesses concurrently several important properties, such as being both a metric and a normalized measure, admitting an exact analytical adjusted-for-chance form, and using the nominal [0,1] range better than other normalized variants.

...read moreread less

1,818 citations

Journal Article•DOI•

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

[...]

Florian Eyben¹, Klaus R. Scherer², Björn Schuller, Johan Sundberg³, Elisabeth André⁴, Carlos Busso⁵, Laurence Devillers⁶, Julien Epps⁷, Petri Laukka⁸, Shrikanth S. Narayanan⁹, Khiet P. Truong¹⁰ - Show less +7 more•Institutions (10)

Technische Universität München¹, University of Geneva², Royal Institute of Technology³, Augsburg College⁴, University of Texas at Dallas⁵, Paris-Sorbonne University⁶, University of New South Wales⁷, Stockholm University⁸, University of Southern California⁹, University of Twente¹⁰

01 Apr 2016-IEEE Transactions on Affective Computing

TL;DR: A basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis, is proposed and intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters.

...read moreread less

Abstract: Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size.

...read moreread less

1,158 citations

Proceedings Article•DOI•

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

[...]

Nguyen Xuan Vinh¹, Julien Epps¹, James Bailey²•Institutions (2)

University of New South Wales¹, University of Melbourne²

14 Jun 2009

TL;DR: This paper derives the analytical formula for the expected mutual information value between a pair of clusterings, and proposes the adjusted version for several popular information theoretic based measures.

...read moreread less

Abstract: Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and usefulness of the adjusted measures.

...read moreread less

748 citations

Journal Article•DOI•

A review of depression and suicide risk assessment using speech analysis

[...]

Nicholas Cummins¹, Stefan Scherer², Jarek Krajewski, Sebastian Schnieder, Julien Epps¹, Thomas F. Quatieri³ - Show less +2 more•Institutions (3)

NICTA¹, University of Southern California², Massachusetts Institute of Technology³

01 Jul 2015-Speech Communication

TL;DR: How common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems is reviewed.

...read moreread less

607 citations

Journal Article•DOI•

Signal Processing in Sequence Analysis: Advances in Eukaryotic Gene Prediction

[...]

Mahmood Akhtar¹, Julien Epps², Eliathamby Ambikairajah²•Institutions (2)

National University of Science and Technology¹, University of New South Wales²

24 Jun 2008-IEEE Journal of Selected Topics in Signal Processing

TL;DR: A new technique for the recognition of acceptor splice sites is proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method, and reveals a consistent reduction in false positives at different levels of sensitivity.

...read moreread less

Abstract: Genomic sequence processing has been an active area of research for the past two decades and has increasingly attracted the attention of digital signal processing researchers in recent years. A challenging open problem in deoxyribonucleic acid (DNA) sequence analysis is maximizing the prediction accuracy of eukaryotic gene locations and thereby protein coding regions. In this paper, DNA symbolic-to-numeric representations are presented and compared with existing techniques in terms of relative accuracy for the gene and exon prediction problem. Novel signal processing-based gene and exon prediction methods are then evaluated together with existing approaches at a nucleotide level using the Burset/Guigo1996, HMR195, and GENSCAN standard genomic datasets. A new technique for the recognition of acceptor splice sites is then proposed, which combines signal processing-based gene and exon prediction methods with an existing data-driven statistical method. By comparison with the acceptor splice site detection method used in the gene-finding program GENSCAN, the proposed DSP-statistical hybrid technique reveals a consistent reduction in false positives at different levels of sensitivity, averaging a 43% reduction when evaluated on the GENSCAN test set.

...read moreread less

168 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Journal Article•DOI•

A general species delimitation method with applications to phylogenetic placements

[...]

Jiajie Zhang¹, Paschalia Kapli¹, Pavlos Pavlidis¹, Alexandros Stamatakis¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

15 Nov 2013-Bioinformatics

TL;DR: The Poisson tree processes (PTP) model is introduced to infer putative species boundaries on a given phylogenetic input tree and yields more accurate results than de novo species delimitation methods.

...read moreread less

Abstract: Motivation: Sequence-based methods to delimit species are central to DNA taxonomy, microbial community surveys and DNA metabarcoding studies. Current approaches either rely on simple sequence similarity thresholds (OTU-picking) or on complex and compute-intensive evolutionary models. The OTU-picking methods scale well on large datasets, but the results are highly sensitive to the similarity threshold. Coalescent-based species delimitation approaches often rely on Bayesian statistics and Markov Chain Monte Carlo sampling, and can therefore only be applied to small datasets. Results: We introduce the Poisson tree processes (PTP) model to infer putative species boundaries on a given phylogenetic input tree. We also integrate PTP with our evolutionary placement algorithm (EPA-PTP) to count the number of species in phylogenetic placements. We compare our approaches with popular OTU-picking methods and the General Mixed Yule Coalescent (GMYC) model. For de novo species delimitation, the stand-alone PTP model generally outperforms GYMC as well as OTU-picking methods when evolutionary distances between species are small. PTP neither requires an ultrametric input tree nor a sequence similarity threshold as input. In the open reference species delimitation approach, EPA-PTP yields more accurate results than de novo species delimitation methods. Finally, EPA-PTP scales on large datasets because it relies on the parallel implementations of the EPA and RAxML, thereby allowing to delimit species in high-throughput sequencing data. Availability and implementation: The code is freely available at www.

...read moreread less

1,868 citations