scispace - formally typeset
Search or ask a question
Author

Vincent Brault

Bio: Vincent Brault is an academic researcher from Département de Mathématiques. The author has contributed to research in topics: Matrix (mathematics) & Biclustering. The author has an hindex of 10, co-authored 36 publications receiving 277 citations. Previous affiliations of Vincent Brault include University of Paris-Sud & Université Paris-Saclay.

Papers
More filters
Journal ArticleDOI
TL;DR: Estimation procedures and model selection criteria derived for binary data are generalised and an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived.
Abstract: This paper deals with estimation and model selection in the Latent Block Model (LBM) for categorical data. First, after providing sufficient conditions ensuring the identifiability of this model, we generalise estimation procedures and model selection criteria derived for binary data. Secondly, we develop Bayesian inference through Gibbs sampling and with a well calibrated non informative prior distribution, in order to get the MAP estimator: this is proved to avoid the traps encountered by the LBM with the maximum likelihood methodology. Then model selection criteria are presented. In particular an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived. Finally numerical experiments on both simulated and real data sets highlight the appeal of the proposed estimation and model selection procedures.

109 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used in situ captured data (eye-tracking data acquired with a mobile device) to study the influences of landscape composition on the landscape perceptions and valuations of city dwellers.

40 citations

Journal ArticleDOI
TL;DR: An analysis and applications of sample pooling to the epidemiologic monitoring of COVID-19 with an economy of tests and a method for the measure of the prevalence in a population, based on group testing, taking into account the increased number of false negatives associated to this method.
Abstract: We propose an analysis and applications of sample pooling to the epidemiologic monitoring of COVID-19. We first introduce a model of the RT-qPCR process used to test for the presence of virus in a sample and construct a statistical model for the viral load in a typical infected individual inspired by large-scale clinical datasets. We present an application of group testing for the prevention of epidemic outbreak in closed connected communities. We then propose a method for the measure of the prevalence in a population taking into account the increased number of false negatives associated with the group testing method.

31 citations

27 Aug 2012
TL;DR: The comparison of a determinist approach using a variational principle with a stochastic approach with a MCMC algorithm is first discussed and applied in the context of binary data to build and compute ICL and BIC criteria for model selection.
Abstract: The latent block model is a mixture model that can be used to deal with the simultaneous clustering of rows and columns of an observed numerical matrix, known as co-clustering. For this mixture model unfortunately, neither the likelihood, nor the EM algorithm are numerically tractable, due to the dependence of the rows and columns into the label joint distribution conditionally to the observations. Several approaches can be considered to compute approximated solutions, for the maximum likelihood estimator as well as for the likelihood itself. The comparison of a determinist approach using a variational principle with a stochastic approach using a MCMC algorithm is first discussed and applied in the context of binary data. These results are then used to build and compute ICL and BIC criteria for model selection. Numerical experiments show the interest of this approach in model selection and data reduction.

24 citations

Journal ArticleDOI
TL;DR: A new criterion based on the Adjusted Rand Index is developed and is called the Co-clusteringadjusted Rand Index named CARI, which suggests new improvements to existing criteria such as the classification error which counts the proportion of misclassified cells and the Extended Normalized Mutual Information criterion.
Abstract: We consider the simultaneous clustering of rows and columns of a matrix and more particularly the ability to measure the agreement between two co-clustering partitions. The new criterion we developed is based on the Adjusted Rand Index and is called the Co-clustering Adjusted Rand Index named CARI. We also suggest new improvements to existing criteria such as the Classification Error which counts the proportion of misclassified cells and the Extended Normalized Mutual Information criterion which is a generalization of the criterion based on mutual information in the case of classic classifications. We study these criteria with regard to some desired properties deriving from the co-clustering context. Experiments on simulated and real observed data are proposed to compare the behavior of these criteria.

19 citations


Cited by
More filters
01 Jan 2016
TL;DR: Thank you very much for reading nonparametrics statistical methods based on ranks, maybe you have knowledge that, people have look hundreds times for their favorite novels like this, but end up in harmful downloads.
Abstract: Thank you very much for reading nonparametrics statistical methods based on ranks. Maybe you have knowledge that, people have look hundreds times for their favorite novels like this nonparametrics statistical methods based on ranks, but end up in harmful downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they are facing with some malicious virus inside their desktop computer.

407 citations

Book
01 Jul 2019
TL;DR: In this paper, the authors frame cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions, such as how many clusters are there? which method should I use? How should I handle outliers.
Abstract: Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.

134 citations

Journal ArticleDOI
TL;DR: Estimation procedures and model selection criteria derived for binary data are generalised and an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived.
Abstract: This paper deals with estimation and model selection in the Latent Block Model (LBM) for categorical data. First, after providing sufficient conditions ensuring the identifiability of this model, we generalise estimation procedures and model selection criteria derived for binary data. Secondly, we develop Bayesian inference through Gibbs sampling and with a well calibrated non informative prior distribution, in order to get the MAP estimator: this is proved to avoid the traps encountered by the LBM with the maximum likelihood methodology. Then model selection criteria are presented. In particular an exact expression of the integrated completed likelihood criterion requiring no asymptotic approximation is derived. Finally numerical experiments on both simulated and real data sets highlight the appeal of the proposed estimation and model selection procedures.

109 citations

Journal ArticleDOI
TL;DR: This work presents a method that enables us to find the solution path for all choices of penalty values across a continuous range and permits an evaluation of the various segmentations to identify a suitable penalty choice.
Abstract: In the multiple changepoint setting, various search methods have been proposed, which involve optimizing either a constrained or penalized cost function over possible numbers and locations of changepoints using dynamic programming. Recent work in the penalized optimization setting has focused on developing an exact pruning-based approach that, under certain conditions, is linear in the number of data points. Such an approach naturally requires the specification of a penalty to avoid under/over-fitting. Work has been undertaken to identify the appropriate penalty choice for data-generating processes with known distributional form, but in many applications the model assumed for the data is not correct and these penalty choices are not always appropriate. To this end, we present a method that enables us to find the solution path for all choices of penalty values across a continuous range. This permits an evaluation of the various segmentations to identify a suitable penalty choice. The computational ...

90 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the first comprehensive treatment of high-dimensional time series factor models with multiple change-points in their second-order structure, using wavelets to estimate the number and locations of changepoints consistently as well as identifying whether they originate in the common or idiosyncratic components.

84 citations