scispace - formally typeset
Search or ask a question
Author

Olivier Grisel

Bio: Olivier Grisel is an academic researcher from Université Paris-Saclay. The author has contributed to research in topics: Python (programming language) & Medicine. The author has an hindex of 11, co-authored 19 publications receiving 63822 citations. Previous affiliations of Olivier Grisel include French Institute for Research in Computer Science and Automation.

Papers
More filters
Posted Content
TL;DR: This work proposes a structured stochastic regularization that relies on feature grouping that acts as a structured regularizer for high-dimensional correlated data without additional computational cost and it has a denoising effect.
Abstract: The use of complex models --with many parameters-- is challenging with high-dimensional small-sample problems: indeed, they face rapid overfitting. Such situations are common when data collection is expensive, as in neuroscience, biology, or geology. Dedicated regularization can be crafted to tame overfit, typically via structured penalties. But rich penalties require mathematical expertise and entail large computational costs. Stochastic regularizers such as dropout are easier to implement: they prevent overfitting by random perturbations. Used inside a stochastic optimizer, they come with little additional cost. We propose a structured stochastic regularization that relies on feature grouping. Using a fast clustering algorithm, we define a family of groups of features that capture feature covariations. We then randomly select these groups inside a stochastic gradient descent loop. This procedure acts as a structured regularizer for high-dimensional correlated data without additional computational cost and it has a denoising effect. We demonstrate the performance of our approach for logistic regression both on a sample-limited face image dataset with varying additive noise and on a typical high-dimensional learning problem, brain image classification.

4 citations

Journal ArticleDOI
TL;DR: This article examined changes in the proportion of mental health-associated hospitalizations among adolescents in the US and France during the first year of the COVID-19 pandemic vs before the pandemic.
Abstract: This cohort study examines changes in the proportion of mental health–associated hospitalizations among adolescents in the US and France during the first year of the COVID-19 pandemic vs before the pandemic.

4 citations

Journal ArticleDOI
TL;DR: A multicentre observational study to determine the impact of biologics on the rate of hospitalisations, intensive care unit (ICU) admissions and deaths related to Coronavirus disease 2019 in patients with inflammatory bowel disease (IBD), suggesting no significant excess of these features in Patients with COVID-19 who were receiving biologicics.
Abstract: Editors, We note the encouraging results reports by Taxonera and colleagues regarding Coronavirus disease 2019 (COVID-19) in patients with inflammatory bowel disease (IBD).1 Patients receiving biologics are at higher risk of developing serious infectious adverse events.2,3 Therefore, we conducted a multicentre observational study to determine the impact of biologics on the rate of hospitalisations, intensive care unit (ICU) admissions and deaths related to COVID-19. We used the French Assistance Publique-Hôpitaux de Paris electronic health record data for patients from 39 hospitals in the Ile-de-France area.4 A total of 23 357 patients had confirmed COVID-19 (positive RT-PCR or suggestive lesions seen on chest CT) from 1 February to 22 April 2020.5,6 “Observed cases” were biologic long-term users with confirmed COVID-19. To calculate, the “expected number” of hospitalised patients (transferred to ICU or dead), we identified 7808 adult patients in the same data source who received a biologic agent from 1 February 2019 to 1 February 2020. We computed the standardised ratio (SR) of observed to expected COVID19-related hospitalisations. The “expected number” of hospitalisations was estimated by applying the COVID-19 hospitalisation rate in the Ilede-France area to the 7808 patients receiving biologics (by 20-year age groups).7 The same methodology was used to estimate the expected number of ICU transfers and deaths. A total of 48 patients with biologic agent had confirmed COVID-19; 26 (54%) women with a median age of 38 (IQR25-75 27.25-57). Most patients received an anti-TNFα (n = 39, 81%), 3 (6%) an IL-12/23 inhibitor, 2 (4%) an IL-17 inhibitor and 4 (9%) an α4ß7 integrin antibody. The main underlying diseases were IBD (n = 24, 50%), inflammatory rheumatism (n = 12, 25%) and psoriasis (n = 3, 6%). A total of 19 (40%) patients were hospitalised, 4 (8%) were transferred to an ICU and one died. As compared with the Ile-de-France population, patients receiving biologics were at higher risk of hospitalisation related to COVID-19 (SR 2.19, 95% CI 1.32-3.42, P < 0.001). Excess hospitalisations were found in the 20-40 year age group (SR 5.59, 95% CI 2.41-11.01, P = 0.01). As compared with the Ile-de-France population, patients receiving biologics were at higher risk of ICU transfers related to COVID-19 (SR 6.04, 95% CI 1.62-15.45, P < 0.001) but overall mortality related to COVID-19 was not modified in patients receiving biologics (SR 1.15, 95% CI 0.02-6.42, P = 0.9). To better understand the significantly increased risk of hospitalisation with COVID-19 in the 20-40 age group, we performed a qualitative post hoc analysis of electronic health records. The main condition causing hospitalisation was a flare of the underlying disorder especially for the age 20-40 group (Table 1). Our evaluation of standardised hospitalisations, ICU transfers and mortality ratios suggested (after post hoc analysis), no significant excess of these features in patients with COVID-19 who were receiving biologics. These results are consistent with earlier reports1,8 and reinforce the message that biologics can usually be safely continued.

4 citations

Journal ArticleDOI
TL;DR: In this article , the authors evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems, using a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission.
Abstract: Given the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates across healthcare systems, countries, and continents. Specifically, we trained a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission. These models were compared at site, country, and continent level. Of the 39,969 hospitalized patients with COVID-19 (68.6% male), 5717 (14.3%) died. In the Cox model, age, albumin, AST, creatine, CRP, and white blood cell count are most predictive of mortality. The baseline covariates are more predictive of mortality during the early days of COVID-19 hospitalization. Models trained at healthcare systems with larger cohort size largely retain good transportability performance when porting to different sites. The combination of routine laboratory test values at admission along with basic demographic features can predict mortality in patients hospitalized with COVID-19. Importantly, this potentially deployable model differs from prior work by demonstrating not only consistent performance but also reliable transportability across healthcare systems in the US and Europe, highlighting the generalizability of this model and the overall approach.

4 citations

Journal ArticleDOI
TL;DR: Le risque d’hospitalisation/ICU/décès était évalué selon the méthodologie du Ratio de Morbidité/Mortalité Standardisé (SMR) en calculant le rapport entre le nombre observé et le nomre attendu.
Abstract: Introduction Les formes severes d’infection a SARS-CoV-2 sont liees a une importante reponse inflammatoire. Certains biomedicaments (BM) sont en cours d’evaluation dans des essais therapeutiques avec pour rationnel cet orage cytokinique. A l’inverse, nous pouvons nous interroger sur le risque d’infection a SARS-CoV-2 chez les patients au long cours sous BM. L’objectif principal de notre etude etait de determiner l’impact de la prise d’un biomedicament sur le taux d’hospitalisation, de passage en reanimation (ICU) et de deces chez les patients sous BM atteints d’une infection a SARS-CoV-2. Materiel et methodes Etude de cohorte retrospective multicentrique a partir des donnees medico-administratives de l’Entrepot de Donnees de Sante (EDS) de l’APHP. L’ensemble des patients recevant un BM (anti-TNF, anti-IL-12/23, anti-IL-17 ou anti-integrine) etait inclus. Les evenements d’interet etaient la survenue d’une hospitalisation, d’un sejour en ICU ou d’un deces dans le cadre d’une infection a SARS-CoV-2 (confirmee par RT-PCR ou TDM thoracique) entre le 01/02 et le 22/04/20. Le risque d’hospitalisation/ICU/deces etait evalue selon la methodologie du Ratio de Morbidite/Mortalite Standardise (SMR) en calculant le rapport entre le nombre observe et le nombre attendu. Le nombre attendu d’hospitalisation/ICU/deces etait calcule en appliquant le taux d’hospitalisation/ICU/deces a SARS-CoV-2 de la population d’Ile-de-France (par tranche âge de 20 ans et sexe) au nombre de personnes sous BM dans la sous-classe correspondante de la meme source de donnees (donnees EDS de l’AP–HP). Resultats Un total de 7 808 patients (âge median 45 ans, 51 % de femmes) etaient inclus, dont 48 avec un diagnostic d’infection a SARS-CoV-2, 19 (40 %) hospitalises, 4 (8 %) en ICU et 1 deces. En comparaison avec la population d’Ile-de-France, les taux d’hospitalisation et d’ICU etaient significativement eleves chez les patients sous BM avec SR = 2,19, 95 %CI 1,32–3,42, p Discussion A partir d’une large base de donnees, notre etude a permis de confirmer que les taux d’hospitalisation, de passage en ICU et de mortalite n’etaient pas augmentes chez les patients sous BM. Nos resultats concordent avec les differentes etudes publiees dans la litterature (Haberman et al., Favalli et al., ou Sanchez-Piedra et al.). Ces resultats plaident pour un maintien des BM en periode d’epidemie du virus SARS-CoV-2 pour eviter la rechute des pathologies inflammatoires de fond.

1 citations


Cited by
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

Proceedings ArticleDOI
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

13,333 citations

Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open source scientific computing library for the Python programming language, which includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics.
Abstract: SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.

12,774 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

11,104 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations