Author
Bertrand Thirion
Other affiliations: French Institute for Research in Computer Science and Automation, French Institute of Health and Medical Research, École Polytechnique ...read more
Bio: Bertrand Thirion is an academic researcher from Université Paris-Saclay. The author has contributed to research in topic(s): Cluster analysis & Cognition. The author has an hindex of 51, co-authored 311 publication(s) receiving 73839 citation(s). Previous affiliations of Bertrand Thirion include French Institute for Research in Computer Science and Automation & French Institute of Health and Medical Research.
Topics: Cluster analysis, Cognition, Population, Supervised learning, Inference
Papers published on a yearly basis
Papers
More filters
Journal Article•
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
33,540 citations
Posted Content•
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.
28,898 citations
TL;DR: It is illustrated how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps and its application to neuroimaging data provides a versatile tool to study the brain.
Abstract: Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
897 citations
TL;DR: The study shows that inter-subject variability plays a prominent role in the relatively low sensitivity and reliability of group studies and focuses on the notion of reproducibility by bootstrapping.
Abstract: The aim of group fMRI studies is to relate contrasts of tasks or stimuli to regional brain activity increases. These studies typically involve 10 to 16 subjects. The average regional activity statistical significance is assessed using the subject to subject variability of the effect (random effects analyses). Because of the relatively small number of subjects included, the sensitivity and reliability of these analyses is questionable and hard to investigate. In this work, we use a very large number of subject (more than 80) to investigate this issue. We take advantage of this large cohort to study the statistical properties of the inter-subject activity and focus on the notion of reproducibility by bootstrapping. We asked simple but important methodological questions: Is there, from the point of view of reliability, an optimal statistical threshold for activity maps? How many subjects should be included in group studies? What method should be preferred for inference? Our results suggest that i) optimal thresholds can indeed be found, and are rather lower than usual corrected for multiple comparison thresholds, ii) 20 subjects or more should be included in functional neuroimaging studies in order to have sufficient reliability, iii) non-parametric significance assessment should be preferred to parametric methods, iv) cluster-level thresholding is more reliable than voxel-based thresholding, and v) mixed effects tests are much more reliable than random effects tests. Moreover, our study shows that inter-subject variability plays a prominent role in the relatively low sensitivity and reliability of group studies.
500 citations
University of Warwick1, McGill University2, Montreal Neurological Institute and Hospital3, Forschungszentrum Jülich4, University of Düsseldorf5, Concordia University6, Otto-von-Guericke University Magdeburg7, Cognition and Brain Sciences Unit8, Nathan Kline Institute for Psychiatric Research9, MIND Institute10, Stanford University11, University of California, Berkeley12, French Institute for Research in Computer Science and Automation13, Washington University in St. Louis14, Erasmus University Medical Center15, National University of Singapore16
TL;DR: Intentions from developing a set of recommendations on behalf of the Organization for Human Brain Mapping are described and barriers that impede these practices are identified, including how the discipline must change to fully exploit the potential of the world's neuroimaging data.
Abstract: Given concerns about the reproducibility of scientific findings, neuroimaging must define best practices for data analysis, results reporting, and algorithm and data sharing to promote transparency, reliability and collaboration. We describe insights from developing a set of recommendations on behalf of the Organization for Human Brain Mapping and identify barriers that impede these practices, including how the discipline must change to fully exploit the potential of the world's neuroimaging data.
416 citations
Cited by
More filters
Journal Article•
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
33,540 citations
Posted Content•
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.
28,898 citations
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。
18,940 citations
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
10,428 citations
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
10,141 citations