scispace - formally typeset
Search or ask a question
Author

Olivier Grisel

Bio: Olivier Grisel is an academic researcher from Université Paris-Saclay. The author has contributed to research in topics: Python (programming language) & Medicine. The author has an hindex of 11, co-authored 19 publications receiving 63822 citations. Previous affiliations of Olivier Grisel include French Institute for Research in Computer Science and Automation.

Papers
More filters
Posted ContentDOI
21 Apr 2018-bioRxiv
TL;DR: In systematic data simulations and common medical datasets, it is explored how statistical inference and pattern recognition can agree and diverge and applies the linear model for identifying significant contributing variables and for finding the most predictive variable sets.
Abstract: In the 20th century many advances in biological knowledge and evidence-based medicine were supported by p-values and accompanying methods. In the beginning 21st century, ambitions towards precision medicine put a premium on detailed predictions for single individuals. The shift causes tension between traditional methods used to infer statistically significant group differences and burgeoning machine-learning tools suited to forecast an individual9s future. This comparison applies the linear model for identifying significant contributing variables and for finding the most predictive variable sets. In systematic data simulations and common medical datasets, we explored how statistical inference and pattern recognition can agree and diverge. Across analysis scenarios, even small predictive performances typically coincided with finding underlying significant statistical relationships. However, even statistically strong findings with very low p-values shed little light on their value for achieving accurate prediction in the same dataset. More complete understanding of different ways to define "important" associations is a prerequisite for reproducible research findings that can serve to personalize clinical care.

16 citations

Journal ArticleDOI
TL;DR: In this paper , the authors performed an observational multicenter retrospective cohort study to examine the association between psychiatric disorders and mortality among patients hospitalized for laboratory-confirmed COVID-19 at 36 Greater Paris University hospitals.
Abstract: Prior research suggests that psychiatric disorders could be linked to increased mortality among patients with COVID-19. However, whether all or specific psychiatric disorders are intrinsic risk factors of death in COVID-19, or whether these associations reflect the greater prevalence of medical risk factors in people with psychiatric disorders, has yet to be evaluated. We performed an observational multicenter retrospective cohort study to examine the association between psychiatric disorders and mortality among patients hospitalized for laboratory-confirmed COVID-19 at 36 Greater Paris University hospitals. Of 15,168 adult patients, 857 (5.7%) had an ICD-10 diagnosis of psychiatric disorder. Over a mean follow-up of 14.6 days (SD=17.9), death occurred in 326/857 (38.0%) patients with a diagnosis of psychiatric disorder versus 1,276/14,311 (8.9%) in patients without such a diagnosis (OR=6.27; 95%CI=5.40-7.28; p<0.01). When adjusting for age, sex, hospital, current smoking status, and medications according to compassionate use or as part of a clinical trial, this association remained significant (AOR=3.27; 95%CI=2.78-3.85; p<0.01). However, additional adjustments for obesity and number of medical conditions resulted in a non-significant association (AOR=1.02; 95%CI=0.84-1.23; p=0.86). Exploratory analyses following the same adjustments suggest that a diagnosis of mood disorders was significantly associated with reduced mortality, which might be explained by the use of antidepressants. These findings suggest that the increased risk of COVID-19-related mortality in individuals with psychiatric disorders hospitalized for COVID-19 might be explained by the greater number of medical conditions and the higher prevalence of obesity in this population, but not by the underlying psychiatric disease.

12 citations

Posted ContentDOI
13 Aug 2018-bioRxiv
TL;DR: A bottom-up machine-learning strategy is developed and a proof of principle in a multi-site clinical dataset is provided that can distinguish patients and controls using brain morphology and intrinsic functional connectivity in schizophrenia.
Abstract: Schizophrenia is a devastating brain disorder that disturbs sensory perception, motor action, and abstract thought. Its clinical phenotype implies dysfunction of various mental domains, which has motivated a series of theories regarding the underlying pathophysiology. Aiming at a predictive benchmark of a catalogue of cognitive functions, we developed a bottom-up machine-learning strategy and provide a proof of principle in a multi-site clinical dataset (n=324). Existing neuroscientific knowledge on diverse cognitive domains was first condensed into neuro-topographical maps. We then examined how the ensuing meta-analytic cognitive priors can distinguish patients and controls using brain morphology and intrinsic functional connectivity. Some affected cognitive domains supported well-studied directions of research on auditory evaluation and social cognition. However, rarely suspected cognitive domains also emerged as disease-relevant, including self-oriented processing of bodily sensations in gustation and pain. Such algorithmic charting of the cognitive landscape can be used to make targeted recommendations for future mental health research.

8 citations

Journal ArticleDOI
TL;DR: In this paper , a retrospective multi-centre observational cohort study comprising 12,891 hospitalized patients aged 18 years or older with a diagnosis of SARS-CoV-2 infection confirmed by polymerase chain reaction from 1 January 2020 to 10 September 2020, and with at least one serum creatinine value 1-365 days prior to admission.

8 citations

Book ChapterDOI
14 Sep 2014
TL;DR: In this article, the authors estimate the amount of variance that is fit by a random effects subspace learned on other images, and show that a principal component regression estimator outperforms other regression models and that it fits a significant proportion (10% to 25%) of the between-subject variability.
Abstract: Inter-subject variability is a major hurdle for neuroimaging group-level inference, as it creates complex image patterns that are not captured by standard analysis models and jeopardizes the sensitivity of statistical procedures. A solution to this problem is to model random subjects effects by using the redundant information conveyed by multiple imaging contrasts. In this paper, we introduce a novel analysis framework, where we estimate the amount of variance that is fit by a random effects subspace learned on other images; we show that a principal component regression estimator outperforms other regression models and that it fits a significant proportion (10% to 25%) of the between-subject variability. This proves for the first time that the accumulation of contrasts in each individual can provide the basis for more sensitive neuroimaging group analyzes.

7 citations


Cited by
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

Proceedings ArticleDOI
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

13,333 citations

Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open source scientific computing library for the Python programming language, which includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics.
Abstract: SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.

12,774 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

11,104 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations