scispace - formally typeset
Search or ask a question
Author

François Husson

Bio: François Husson is an academic researcher from Agrocampus Ouest. The author has contributed to research in topics: Missing data & Principal component analysis. The author has an hindex of 26, co-authored 85 publications receiving 7559 citations. Previous affiliations of François Husson include École nationale supérieure agronomique de Rennes & Centre national de la recherche scientifique.


Papers
More filters
Journal ArticleDOI
TL;DR: FactoMineR an R package dedicated to multivariate data analysis with the possibility to take into account different types of variables (quantitative or categorical), different kinds of structure on the data, and finally supplementary information (supplementary individuals and variables).
Abstract: In this article, we present FactoMineR an R package dedicated to multivariate data analysis. The main features of this package is the possibility to take into account different types of variables (quantitative or categorical), different types of structure on the data (a partition on the variables, a hierarchy on the variables, a partition on the individuals) and finally supplementary information (supplementary individuals and variables). Moreover, the dimensions issued from the different exploratory data analyses can be automatically described by quantitative and/or categorical variables. Numerous graphics are also available with various options. Finally, a graphical user interface is implemented within the Rcmdr environment in order to propose an user friendly package.

6,472 citations

Journal ArticleDOI
TL;DR: The missMDA as mentioned in this paper package performs principal component analysis on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values, and can be used to perform single imputation to complete data involving continuous, categorical and mixed variables.
Abstract: We present the R package missMDA which performs principal component methods on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values. Package methods include principal component analysis for continuous variables, multiple correspondence analysis for categorical variables, factorial analysis on mixed data for both continuous and categorical variables, and multiple factor analysis for multi-table data. Furthermore, missMDA can be used to perform single imputation to complete data involving continuous, categorical and mixed variables. A multiple imputation method is also available. In the principal component analysis framework, variability across different imputations is represented by confidence areas around the row and column positions on the graphical outputs. This allows assessment of the credibility of results obtained from incomplete data sets.

758 citations

BookDOI
15 Nov 2010
TL;DR: Principal Component Analysis (PCA) Data - Notation - Examples Objectives Studying Individuals Studying Variables Relationships between the Two Representations NI and NK Interpreting the Data Implementation with FactoMineR Additional Results.
Abstract: Principal Component Analysis (PCA) Data - Notation - Examples Objectives Studying Individuals Studying Variables Relationships between the Two Representations NI and NK Interpreting the Data Implementation with FactoMineR Additional Results Example: The Decathlon Dataset Example: The Temperature Dataset Example of Genomic Data: The Chicken Dataset Correspondence Analysis (CA) Data - Notation - Examples Objectives and the Independence Model Fitting the Clouds Interpreting the Data Supplementary Elements (= Illustrative) Implementation with FactoMineR CA and Textual Data Processing Example: The Olympic Games Dataset Example: The White Wines Dataset Example: The Causes of Mortality Dataset Multiple Correspondence Analysis (MCA) Data - Notation - Examples Objectives Defining Distances between Individuals and Distances between Categories CA on the Indicator Matrix Interpreting the Data Implementation with FactoMineR Addendum Example: The Survey on the Perception of Genetically Modified Organisms Example: The Sorting Task Dataset Clustering Data - Issues Formalising the Notion of Similarity Constructing an Indexed Hierarchy Ward's Method Direct Search for Partitions: K-means Algorithm Partitioning and Hierarchical Clustering Clustering and Principal Component Methods Example: The Temperature Dataset Example: The Tea Dataset Dividing Quantitative Variables into Classes Appendix Percentage of Inertia Explained by the First Component or by the First Plane R Software Bibliography of Software Packages Bibliography Index

454 citations

Journal ArticleDOI
TL;DR: The current approximations (normal approximation, a log-transformation and Pearson type III approximation) are discussed and a new one is described: an Edgeworth expansion.

210 citations

14 Dec 2012
TL;DR: A regularized iterative PCA algorithm to provide point estimates of the principal axes and components and to overcome the major issue of overfitting is described and implemented in the R package missMDA.
Abstract: This paper is a written version of the talk Julie Josse delivered at the 44 Journees de Statistique (Bruxelles, 2012), when being awarded the Marie-Jeanne Laurent-Duhamel prize for her Ph.D. dissertation by the French Statistical Society. It proposes an overview of some results, proposed in Julie Josse and Francois Husson’s papers, as well as new challenges in the field of handling missing values in exploratory multivariate data analysis methods and especially in principal component analysis (PCA). First we describe a regularized iterative PCA algorithm to provide point estimates of the principal axes and components and to overcome the major issue of overfitting. Then, we give insight in the parameters variance using a non parametric multiple imputation procedure. Finally, we discuss the problem of the choice of the number of dimensions and we detail cross-validation approximation criteria. The proposed methodology is implemented in the R package missMDA.

210 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is concluded that multiple Imputation for Nonresponse in Surveys should be considered as a legitimate method for answering the question of why people do not respond to survey questions.
Abstract: 25. Multiple Imputation for Nonresponse in Surveys. By D. B. Rubin. ISBN 0 471 08705 X. Wiley, Chichester, 1987. 258 pp. £30.25.

3,216 citations

Journal ArticleDOI

3,152 citations

Journal ArticleDOI
TL;DR: A web tool called ClustVis that aims to have an intuitive user interface for the Principal Component Analysis and heatmap plots and is freely available at http://biit.cs.ut.ee/clustvis/.
Abstract: The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/.

2,293 citations

Journal ArticleDOI
TL;DR: The microbiome of ileal Crohn's disease was notable for increases in virulence and secretion pathways, and the first insights into community-wide microbial processes and pathways that underpin IBD pathogenesis are provided.
Abstract: Background: The inflammatory bowel diseases (IBD) Crohn’s disease and ulcerative colitis result from alterations in intestinal microbes and the immune system. However, the precise dysfunctions of microbial metabolism in the gastrointestinal microbiome during IBD remain unclear. We analyzed the microbiota of intestinal biopsies and stool samples from 231 IBD and healthy subjects by 16S gene pyrosequencing and followed up a subset using shotgun metagenomics. Gene and pathway composition were assessed, based on 16S data from phylogenetically-related reference genomes, and associated using sparse multivariate linear modeling with medications, environmental factors, and IBD status. Results: Firmicutes and Enterobacteriaceae abundances were associated with disease status as expected, but also with treatment and subject characteristics. Microbial function, though, was more consistently perturbed than composition, with 12% of analyzed pathways changed compared with 2% of genera. We identified major shifts in oxidative stress pathways, as well as decreased carbohydrate metabolism and amino acid biosynthesis in favor of nutrient transport and uptake. The microbiome of ileal Crohn’s disease was notable for increases in virulence and secretion pathways.

2,189 citations

Journal ArticleDOI
TL;DR: An expert elicitation survey estimates yield losses for the five major food crops worldwide, suggesting that the highest losses are associated with food-deficit regions with fast-growing populations and frequently with emerging or re-emerging pests and diseases.
Abstract: Crop pathogens and pests reduce the yield and quality of agricultural production. They cause substantial economic losses and reduce food security at household, national and global levels. Quantitative, standardized information on crop losses is difficult to compile and compare across crops, agroecosystems and regions. Here, we report on an expert-based assessment of crop health, and provide numerical estimates of yield losses on an individual pathogen and pest basis for five major crops globally and in food security hotspots. Our results document losses associated with 137 pathogens and pests associated with wheat, rice, maize, potato and soybean worldwide. Our yield loss (range) estimates at a global level and per hotspot for wheat (21.5% (10.1–28.1%)), rice (30.0% (24.6–40.9%)), maize (22.5% (19.5–41.1%)), potato (17.2% (8.1–21.0%)) and soybean (21.4% (11.0–32.4%)) suggest that the highest losses are associated with food-deficit regions with fast-growing populations, and frequently with emerging or re-emerging pests and diseases. Our assessment highlights differences in impacts among crop pathogens and pests and among food security hotspots. This analysis contributes critical information to prioritize crop health management to improve the sustainability of agroecosystems in delivering services to societies. An expert elicitation survey estimates yield losses for the five major food crops worldwide, suggesting that the highest losses are associated with food-deficit regions with fast-growing populations and frequently with emerging or re-emerging pests and diseases.

1,376 citations