scispace - formally typeset
Search or ask a question
Author

Bertrand Thirion

Bio: Bertrand Thirion is an academic researcher from Université Paris-Saclay. The author has contributed to research in topics: Cluster analysis & Cognition. The author has an hindex of 51, co-authored 311 publications receiving 73839 citations. Previous affiliations of Bertrand Thirion include French Institute for Research in Computer Science and Automation & French Institute of Health and Medical Research.


Papers
More filters
01 Jan 2012
TL;DR: It is concluded that adding matter information consistently improves the quantitative analysis of BOLD responses in some areas of the brain, particularly those where accurate inter-subject registration remains challenging.
Abstract: In this paper we investigate the use of classical fMRI Random Effect (RFX) group statistics when analysing a very large cohort and the possible improvement brought from anatomical information. Using 1326 subjects from the IMAGEN study, we first give a global picture of the evolution of the group effect t-value from a simple face-watching contrast with increasing cohort size. We obtain a wide "activated" pattern, far from being limited to the reasonably expected brain areas, illustrating the difference between statistical significance and practical significance. This motivates us to inject tissue-probability information into the group estimation, we model the BOLD contrast using a matter-weighted mixture of Gaussians and compare it to the common, single-Gaussian model. In both cases, the models parameters are estimated per-voxel for one subgroup, and the likelihood of both models is computed on a second, separate subgroup to reflect models generalization capacity. Various group sizes are tested, and significance is asserted using a 10-fold cross-validation scheme. We conclude that adding matter information consistently improves the quantitative analysis of BOLD responses in some areas of the brain, particularly those where accurate inter-subject registration remains challenging.
TL;DR: In this paper , the authors provide a thorough evaluation of estimators for direct and indirect outcomes in the context of causal mediation analysis for binary, continuous and multi-dimensional mediators, and propose and assess the relevance of several extensions inspired from double or debiased machine learning.
Abstract: Summary Mediation analysis breaks down the causal effect of a treatment on an outcome into an indirect effect, acting through a third group of variables called mediators and a direct effect, operating through other mechanisms. We provide a thorough evaluation of estimators for direct and indirect effects in the context of causal mediation analysis for binary, continuous and multi-dimensional mediators. We consider standard parametric implementations of classical estimators, and propose and assess the relevance of several extensions inspired from double or debiased machine learning, in particular non-parametric models, regularization, probability calibration and cross-fitting. Our results show that most methods obtain reasonable estimates under model misspecification, but some methods, including multiply-robust methods, are very sensitive to (near-)violations of the overlap assumption. This trend is even more pro-nounced in multi-dimensional settings. We also describe settings where the use of more complex non-parametric models for estimation is relevant. To illustrate the considered methods on real data, we examine the causal path from higher education graduation to middle-age general intelligence in the UK Biobank, which includes several potential binary, continuous and multi-dimensional mediators. This analysis shows that this effect is partially mediated by having a physical occupation, and brain characteristics measured through MRI, but not by the brain age, a popular MRI-derived phenotype.
Posted Content
TL;DR: Conditional Independent Components Analysis (Conditional ICA) as discussed by the authors is a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition.
Abstract: Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging. A likely reason is that GANs training is ill-suited to the noisy, high-dimensional and small-sample data available in functional neuroimaging. In this paper, we introduce Conditional Independent Components Analysis (Conditional ICA): a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition. We then propose a mechanism to condition the generator on classes observed with few samples. We first show that the generative mechanism is successful at synthesizing data indistinguishable from observations, and that it yields gains in classification accuracy in brain decoding problems. In particular it outperforms GANs while being much easier to optimize and interpret. Lastly, Conditional ICA enhances classification accuracy in eight datasets without further parameters tuning.
Proceedings ArticleDOI
29 May 2022
TL;DR: This work proposes CRT-logit, an algorithm that combines a variable-distillation step and a decorrelation step that takes into account the geometry of ‘ 1 -penalized logistic regression problem’ to improve the Conditional Randomization Test.
Abstract: Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features $p$ is as large as or larger than the number of samples $n$. Here, we tackle this problem by improving the Conditional Randomization Test (CRT). The original CRT algorithm shows promise as a way to output p-values while making few assumptions on the distribution of the test statistics. As it comes with a prohibitive computational cost even in mildly high-dimensional problems, faster solutions based on distillation have been proposed. Yet, they rely on unrealistic hypotheses and result in low-power solutions. To improve this, we propose \emph{CRT-logit}, an algorithm that combines a variable-distillation step and a decorrelation step that takes into account the geometry of $\ell_1$-penalized logistic regression problem. We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
Posted ContentDOI
05 Jul 2023-bioRxiv
TL;DR: In this article , the authors present a denoising benchmark for functional magnetic resonance imaging (fMRI) connectivity analyses based on the fMRIprep software, which is implemented in a fully reproducible framework, where the provided research objects enable readers to reproduce or modify core computations.
Abstract: Reducing contributions from non-neuronal sources is a crucial step in functional magnetic resonance imaging (fMRI) connectivity analyses. Many viable strategies for denoising fMRI are used in the literature, and practitioners rely on denoising benchmarks for guidance in the selection of an appropriate choice for their study. However, fMRI denoising software is an ever-evolving field, and the benchmarks can quickly become obsolete as the techniques or implementations change. In this work, we present a denoising benchmark featuring a range of denoising strategies, datasets and evaluation metrics for connectivity analyses, based on the popular fMRIprep software. The benchmark is implemented in a fully reproducible framework, where the provided research objects enable readers to reproduce or modify core computations, as well as the figures of the article using the Jupyter Book project and the Neurolibre reproducible preprint server (https://neurolibre.org/). We demonstrate how such a reproducible benchmark can be used for continuous evaluation of research software, by comparing two versions of the fMRIprep software package. The majority of benchmark results were consistent with prior literature. Scrubbing, a technique which excludes time points with excessive motion, combined with global signal regression, is generally effective at noise removal. Scrubbing however disrupts the continuous sampling of brain images and is incompatible with some statistical analyses, e.g. auto-regressive modeling. In this case, a simple strategy using motion parameters, average activity in select brain compartments, and global signal regression should be preferred. Importantly, we found that certain denoising strategies behave inconsistently across datasets and/or versions of fMRIPrep, or had a different behavior than in previously published benchmarks. This work will hopefully provide useful guidelines for the fMRIprep users community, and highlight the importance of continuous evaluation of research methods. Our reproducible benchmark infrastructure will facilitate such continuous evaluation in the future, and may also be applied broadly to different tools or even research fields.

Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Posted Content
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

Proceedings ArticleDOI
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

13,333 citations