Home
/
Authors
/
Claire Duvallet

Author

Claire Duvallet

Bio: Claire Duvallet is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Population & Wastewater. The author has an hindex of 13, co-authored 45 publications receiving 5678 citations.

Topics: Population, Wastewater, Microbiome, Medicine, Gastrointestinal Microbiome ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

[...]

Evan Bolyen¹, Jai Ram Rideout¹, Matthew R. Dillon¹, Nicholas A. Bokulich¹, Christian C. Abnet², Gabriel A. Al-Ghalith³, Harriet Alexander⁴, Harriet Alexander⁵, Eric J. Alm⁶, Manimozhiyan Arumugam⁷, Francesco Asnicar⁸, Yang Bai⁹, Jordan E. Bisanz¹⁰, Kyle Bittinger¹¹, Asker Daniel Brejnrod⁷, Colin J. Brislawn¹², C. Titus Brown⁵, Benjamin J. Callahan¹³, Andrés Mauricio Caraballo-Rodríguez¹⁴, John Chase¹, Emily K. Cope¹, Ricardo Silva¹⁴, Christian Diener¹⁵, Pieter C. Dorrestein¹⁴, Gavin M. Douglas¹⁶, Daniel M. Durall¹⁷, Claire Duvallet⁶, Christian F. Edwardson, Madeleine Ernst¹⁴, Madeleine Ernst¹⁸, Mehrbod Estaki¹⁷, Jennifer Fouquier¹⁹, Julia M. Gauglitz¹⁴, Sean M. Gibbons¹⁵, Sean M. Gibbons²⁰, Deanna L. Gibson¹⁷, Antonio Gonzalez¹⁴, Kestrel Gorlick¹, Jiarong Guo²¹, Benjamin Hillmann³, Susan Holmes²², Hannes Holste¹⁴, Curtis Huttenhower²³, Curtis Huttenhower²⁴, Gavin A. Huttley²⁵, Stefan Janssen²⁶, Alan K. Jarmusch¹⁴, Lingjing Jiang¹⁴, Benjamin D. Kaehler²⁵, Benjamin D. Kaehler²⁷, Kyo Bin Kang²⁸, Kyo Bin Kang¹⁴, Christopher R. Keefe¹, Paul Keim¹, Scott T. Kelley²⁹, Dan Knights³, Irina Koester¹⁴, Tomasz Kosciolek¹⁴, Jorden Kreps¹, Morgan G. I. Langille¹⁶, Joslynn S. Lee³⁰, Ruth E. Ley³¹, Ruth E. Ley³², Yong-Xin Liu, Erikka Loftfield², Catherine A. Lozupone¹⁹, Massoud Maher¹⁴, Clarisse Marotz¹⁴, Bryan D Martin²⁰, Daniel McDonald¹⁴, Lauren J. McIver²⁴, Lauren J. McIver²³, Alexey V. Melnik¹⁴, Jessica L. Metcalf³³, Sydney C. Morgan¹⁷, Jamie Morton¹⁴, Ahmad Turan Naimey¹, Jose A. Navas-Molina³⁴, Jose A. Navas-Molina¹⁴, Louis-Félix Nothias¹⁴, Stephanie B. Orchanian, Talima Pearson¹, Samuel L. Peoples³⁵, Samuel L. Peoples²⁰, Daniel Petras¹⁴, Mary L. Preuss³⁶, Elmar Pruesse¹⁹, Lasse Buur Rasmussen⁷, Adam R. Rivers³⁷, Michael S. Robeson³⁸, Patrick Rosenthal³⁶, Nicola Segata⁸, Michael Shaffer¹⁹, Arron Shiffer¹, Rashmi Sinha², Se Jin Song¹⁴, John R. Spear³⁹, Austin D. Swafford, Luke R. Thompson⁴⁰, Luke R. Thompson⁴¹, Pedro J. Torres²⁹, Pauline Trinh²⁰, Anupriya Tripathi¹⁴, Peter J. Turnbaugh¹⁰, Sabah Ul-Hasan⁴², Justin J. J. van der Hooft⁴³, Fernando Vargas, Yoshiki Vázquez-Baeza¹⁴, Emily Vogtmann², Max von Hippel⁴⁴, William A. Walters³², Yunhu Wan², Mingxun Wang¹⁴, Jonathan Warren⁴⁵, Kyle C. Weber³⁷, Kyle C. Weber⁴⁶, Charles H. D. Williamson¹, Amy D. Willis²⁰, Zhenjiang Zech Xu¹⁴, Jesse R. Zaneveld²⁰, Yilong Zhang⁴⁷, Qiyun Zhu¹⁴, Rob Knight¹⁴, J. Gregory Caporaso¹ - Show less +120 more•Institutions (47)

Northern Arizona University¹, National Institutes of Health², University of Minnesota³, Woods Hole Oceanographic Institution⁴, University of California, Davis⁵, Massachusetts Institute of Technology⁶, University of Copenhagen⁷, University of Trento⁸, Chinese Academy of Sciences⁹, University of California, San Francisco¹⁰, University of Pennsylvania¹¹, Pacific Northwest National Laboratory¹², North Carolina State University¹³, University of California, San Diego¹⁴, Institute for Systems Biology¹⁵, Dalhousie University¹⁶, University of British Columbia¹⁷, Statens Serum Institut¹⁸, Anschutz Medical Campus¹⁹, University of Washington²⁰, Michigan State University²¹, Stanford University²², Harvard University²³, Broad Institute²⁴, Australian National University²⁵, University of Düsseldorf²⁶, University of New South Wales²⁷, Sookmyung Women's University²⁸, San Diego State University²⁹, Howard Hughes Medical Institute³⁰, Cornell University³¹, Max Planck Society³², Colorado State University³³, Google³⁴, Syracuse University³⁵, Webster University³⁶, United States Department of Agriculture³⁷, University of Arkansas for Medical Sciences³⁸, Colorado School of Mines³⁹, University of Southern Mississippi⁴⁰, National Oceanic and Atmospheric Administration⁴¹, University of California, Merced⁴², Wageningen University and Research Centre⁴³, University of Arizona⁴⁴, Environment Agency⁴⁵, University of Florida⁴⁶, Merck & Co.⁴⁷

01 Aug 2019-Nature Biotechnology

TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.

...read moreread less

Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.

...read moreread less

8,821 citations

Posted Content•DOI•

QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science

[...]

Evan Bolyen¹, Jai Ram Rideout¹, Matthew R. Dillon¹, Nicholas A. Bokulich¹, Christian C. Abnet, Gabriel A. Al-Ghalith², Harriet Alexander³, Harriet Alexander⁴, Eric J. Alm⁵, Manimozhiyan Arumugam⁶, Francesco Asnicar⁷, Yang Bai⁸, Jordan E. Bisanz⁹, Kyle Bittinger¹⁰, Asker Daniel Brejnrod⁶, Colin J. Brislawn¹¹, C. Titus Brown⁴, Benjamin J. Callahan¹², Andrés Mauricio Caraballo-Rodríguez¹³, John Chase¹, Emily K. Cope¹, Ricardo Silva¹³, Pieter C. Dorrestein¹³, Gavin M. Douglas¹⁴, Daniel M. Durall¹⁵, Claire Duvallet⁵, Christian F. Edwardson¹⁶, Madeleine Ernst¹³, Mehrbod Estaki¹⁵, Jennifer Fouquier¹⁷, Julia M. Gauglitz¹³, Deanna L. Gibson¹⁵, Antonio Gonzalez¹⁸, Kestrel Gorlick¹, Jiarong Guo¹⁹, Benjamin Hillmann², Susan Holmes²⁰, Hannes Holste¹⁸, Curtis Huttenhower²¹, Curtis Huttenhower²², Gavin A. Huttley²³, Stefan Janssen²⁴, Alan K. Jarmusch¹³, Lingjing Jiang¹⁸, Benjamin D. Kaehler²³, Kyo Bin Kang²⁵, Kyo Bin Kang¹³, Christopher R. Keefe¹, Paul Keim¹, Scott T. Kelley²⁶, Dan Knights², Irina Koester¹³, Irina Koester¹⁸, Tomasz Kosciolek¹⁸, Jorden Kreps¹, Morgan G. I. Langille¹⁴, Joslynn S. Lee²⁷, Ruth E. Ley²⁸, Ruth E. Ley²⁹, Yong-Xin Liu⁸, Erikka Loftfield, Catherine A. Lozupone¹⁷, Massoud Maher¹⁸, Clarisse Marotz¹⁸, Bryan D Martin³⁰, Daniel McDonald¹⁸, Lauren J. McIver²², Lauren J. McIver²¹, Alexey V. Melnik¹³, Jessica L. Metcalf³¹, Sydney C. Morgan¹⁵, Jamie Morton¹⁸, Ahmad Turan Naimey¹, Jose A. Navas-Molina¹⁸, Jose A. Navas-Molina³², Louis-Félix Nothias¹³, Stephanie B. Orchanian¹⁸, Talima Pearson¹, Samuel L. Peoples³⁰, Samuel L. Peoples³³, Daniel Petras¹³, Mary L. Preuss³⁴, Elmar Pruesse¹⁷, Lasse Buur Rasmussen⁶, Adam R. Rivers³⁵, Ii Michael S Robeson³⁶, Patrick Rosenthal³⁴, Nicola Segata⁷, Michael Shaffer¹⁷, Arron Shiffer¹, Rashmi Sinha, Se Jin Song¹⁸, John R. Spear³⁷, Austin D. Swafford¹⁸, Luke R. Thompson³⁸, Luke R. Thompson³⁹, Pedro J. Torres²⁶, Pauline Trinh³⁰, Anupriya Tripathi¹³, Anupriya Tripathi¹⁸, Peter J. Turnbaugh⁹, Sabah Ul-Hasan⁴⁰, Justin J. J. van der Hooft⁴¹, Fernando Vargas¹⁸, Yoshiki Vázquez-Baeza¹⁸, Emily Vogtmann, Max von Hippel⁴², William A. Walters²⁸, Yunhu Wan, Mingxun Wang¹³, Jonathan Warren⁴³, Kyle C. Weber⁴⁴, Kyle C. Weber³⁵, Chase Hd Williamson¹, Amy D. Willis³⁰, Zhenjiang Zech Xu¹⁸, Jesse R. Zaneveld³⁰, Yilong Zhang⁴⁵, Rob Knight¹⁸, J. Gregory Caporaso¹ - Show less +116 more•Institutions (45)

Northern Arizona University¹, University of Minnesota², Woods Hole Oceanographic Institution³, University of California, Davis⁴, Massachusetts Institute of Technology⁵, University of Copenhagen⁶, University of Trento⁷, Chinese Academy of Sciences⁸, University of California, San Francisco⁹, Children's Hospital of Philadelphia¹⁰, Pacific Northwest National Laboratory¹¹, North Carolina State University¹², University of Montana¹³, Dalhousie University¹⁴, University of British Columbia¹⁵, Shedd Aquarium¹⁶, University of Colorado Denver¹⁷, University of California, San Diego¹⁸, Michigan State University¹⁹, Stanford University²⁰, Broad Institute²¹, Harvard University²², Australian National University²³, University of Düsseldorf²⁴, Sookmyung Women's University²⁵, San Diego State University²⁶, Howard Hughes Medical Institute²⁷, Max Planck Society²⁸, Cornell University²⁹, University of Washington³⁰, Colorado State University³¹, Google³², Syracuse University³³, Webster University³⁴, United States Department of Agriculture³⁵, University of Arkansas for Medical Sciences³⁶, Colorado School of Mines³⁷, University of Southern Mississippi³⁸, Atlantic Oceanographic and Meteorological Laboratory³⁹, University of California, Merced⁴⁰, Wageningen University and Research Centre⁴¹, University of Arizona⁴², Environment Agency⁴³, University of Florida⁴⁴, Merck & Co.⁴⁵

24 Oct 2018-PeerJ

TL;DR: QIIME 2 provides new features that will drive the next generation of microbiome research, including interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.

...read moreread less

Abstract: We present QIIME 2, an open-source microbiome data science platform accessible to users spanning the microbiome research ecosystem, from scientists and engineers to clinicians and policy makers. QIIME 2 provides new features that will drive the next generation of microbiome research. These include interactive spatial and temporal analysis and visualization tools, support for metabolomics and shotgun metagenomics analysis, and automated data provenance tracking to ensure reproducible, transparent microbiome data science.

...read moreread less

875 citations

Journal Article•DOI•

Meta-analysis of gut microbiome studies identifies disease-specific and shared responses.

[...]

Claire Duvallet¹, Sean M. Gibbons¹, Sean M. Gibbons², Thomas Gurry², Thomas Gurry¹, Rafael A. Irizarry³, Eric J. Alm¹, Eric J. Alm² - Show less +4 more•Institutions (3)

Massachusetts Institute of Technology¹, Broad Institute², Harvard University³

05 Dec 2017-Nature Communications

TL;DR: The MicrobiomeHD database, which includes 28 published case–control gut microbiome studies spanning ten diseases, is introduced, and a cross-disease meta-analysis of these studies using standardized methods finds consistent patterns characterizing disease-associated microbiome changes.

...read moreread less

Abstract: Hundreds of clinical studies have demonstrated associations between the human microbiome and disease, yet fundamental questions remain on how we can generalize this knowledge. Results from individual studies can be inconsistent, and comparing published data is further complicated by a lack of standard processing and analysis methods. Here we introduce the MicrobiomeHD database, which includes 28 published case–control gut microbiome studies spanning ten diseases. We perform a cross-disease meta-analysis of these studies using standardized methods. We find consistent patterns characterizing disease-associated microbiome changes. Some diseases are associated with over 50 genera, while most show only 10–15 genus-level changes. Some diseases are marked by the presence of potentially pathogenic microbes, whereas others are characterized by a depletion of health-associated bacteria. Furthermore, we show that about half of genera associated with individual studies are bacteria that respond to more than one disease. Thus, many associations found in case–control studies are likely not disease-specific but rather part of a non-specific, shared response to health and disease. Reported associations between the human microbiome and disease are often inconsistent. Here, Duvallet et al. perform a meta-analysis of 28 gut microbiome studies spanning ten diseases, and find associations that are likely not disease-specific but potentially part of a shared response to disease.

...read moreread less

641 citations

Journal Article•DOI•

SARS-CoV-2 Titers in Wastewater Are Higher than Expected from Clinically Confirmed Cases

[...]

Fuqing Wu¹, Jianbo Zhang¹, Amy Xiao¹, Xiaoqiong Gu², Wei Lin Lee², Federica Armas², Kathryn M. Kauffman³, William P. Hanage⁴, Mariana Matus, Newsha Ghaeli, Noriko Endo, Claire Duvallet, Mathilde Poyet¹, Katya Moniz¹, Alex D. Washburne, Timothy B. Erickson⁴, Timothy B. Erickson⁵, Peter R. Chai¹, Peter R. Chai⁵, Janelle R. Thompson⁶, Eric J. Alm¹ - Show less +17 more•Institutions (6)

Massachusetts Institute of Technology¹, National University of Singapore², State University of New York System³, Harvard University⁴, Brigham and Women's Hospital⁵, Nanyang Technological University⁶

25 Aug 2020

TL;DR: A laboratory protocol to quantify viral titers in raw sewage via qPCR analysis and validate results with sequencing analysis suggests that the number of positive cases estimated from wastewater viral titer is orders of magnitude greater than the numberof confirmed clinical cases and therefore may significantly impact efforts to understand the case fatality rate and progression of disease.

...read moreread less

Abstract: Wastewater surveillance represents a complementary approach to clinical surveillance to measure the presence and prevalence of emerging infectious diseases like the novel coronavirus SARS-CoV-2. This innovative data source can improve the precision of epidemiological modeling to understand the penetrance of SARS-CoV-2 in specific vulnerable communities. Here, we tested wastewater collected at a major urban treatment facility in Massachusetts and detected SARS-CoV-2 RNA from the N gene at significant titers (57 to 303 copies per ml of sewage) in the period from 18 to 25 March 2020 using RT-qPCR. We validated detection of SARS-CoV-2 by Sanger sequencing the PCR product from the S gene. Viral titers observed were significantly higher than expected based on clinically confirmed cases in Massachusetts as of 25 March. Our approach is scalable and may be useful in modeling the SARS-CoV-2 pandemic and future outbreaks. IMPORTANCE Wastewater-based surveillance is a promising approach for proactive outbreak monitoring. SARS-CoV-2 is shed in stool early in the clinical course and infects a large asymptomatic population, making it an ideal target for wastewater-based monitoring. In this study, we develop a laboratory protocol to quantify viral titers in raw sewage via qPCR analysis and validate results with sequencing analysis. Our results suggest that the number of positive cases estimated from wastewater viral titers is orders of magnitude greater than the number of confirmed clinical cases and therefore may significantly impact efforts to understand the case fatality rate and progression of disease. These data may help inform decisions surrounding the advancement or scale-back of social distancing and quarantine efforts based on dynamic wastewater catchment-level estimations of prevalence.

...read moreread less

612 citations

Posted Content•DOI•

SARS-CoV-2 titers in wastewater are higher than expected from clinically confirmed cases

[...]

Fuqing Wu¹, Amy Xiao¹, Jianbo Zhang¹, Xiaoqiong Gu², Wei Lin Lee², Kathryn M. Kauffman³, William P. Hanage⁴, Mariana Matus, Newsha Ghaeli, Noriko Endo, Claire Duvallet, Katya Moniz¹, Timothy B. Erickson⁵, Peter R. Chai⁵, Janelle R. Thompson⁶, Eric J. Alm¹ - Show less +12 more•Institutions (6)

07 Apr 2020-medRxiv

TL;DR: Wastewater surveillance at a major urban treatment facility in Massachusetts found the presence of SARS-CoV-2 at high titers in the period from March 18 - 25 using RT-qPCR, and the identity of the PCR product was confirmed by direct DNA sequencing.

...read moreread less

Abstract: Wastewater surveillance may represent a complementary approach to measure the presence and even prevalence of infectious diseases when the capacity for clinical testing is limited. Moreover, aggregate, population-wide data can help inform modeling efforts. We tested wastewater collected at a major urban treatment facility in Massachusetts and found the presence of SARS-CoV-2 at high titers in the period from March 18 - 25 using RT-qPCR. We then confirmed the identity of the PCR product by direct DNA sequencing. Viral titers observed were significantly higher than expected based on clinically confirmed cases in Massachusetts as of March 25. The reason for the discrepancy is not yet clear, however, and until further experiments are complete, these data do not necessarily indicate that clinical estimates are incorrect. Our approach is scalable and may be useful in modeling the SARS-CoV-2 pandemic and future outbreaks.

...read moreread less

358 citations

1
2
3
4
…
5
6
7
8
9
10
11

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.

[...]

Bui Quang Minh¹, Heiko A. Schmidt², Olga Chernomor², Dominik Schrempf², Dominik Schrempf³, Michael D. Woodhams⁴, Arndt von Haeseler⁵, Arndt von Haeseler², Robert Lanfear¹ - Show less +5 more•Institutions (5)

Australian National University¹, Medical University of Vienna², Eötvös Loránd University³, University of Tasmania⁴, University of Vienna⁵

01 May 2020-Molecular Biology and Evolution

TL;DR: Some notable features of IQ-TREE version 2 are described and the key advantages over other software are highlighted.

...read moreread less

Abstract: IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

...read moreread less

4,337 citations

Journal Article•

Methods in Enzymology.

[...]

Thomas E. Creighton

01 Feb 1968-Yale Journal of Biology and Medicine

TL;DR: This volume is keyed to high resolution electron microscopy, which is a sophisticated form of structural analysis, but really morphology in a modern guise, the physical and mechanical background of the instrument and its ancillary tools are simply and well presented.

...read moreread less

Abstract: I read this book the same weekend that the Packers took on the Rams, and the experience of the latter event, obviously, colored my judgment. Although I abhor anything that smacks of being a handbook (like, \"How to Earn a Merit Badge in Neurosurgery\") because too many volumes in biomedical science already evince a boyscout-like approach, I must confess that parts of this volume are fast, scholarly, and significant, with certain reservations. I like parts of this well-illustrated book because Dr. Sj6strand, without so stating, develops certain subjects on technique in relation to the acquisition of judgment and sophistication. And this is important! So, given that the author (like all of us) is somewhat deficient in some areas, and biased in others, the book is still valuable if the uninitiated reader swallows it in a general fashion, realizing full well that what will be required from the reader is a modulation to fit his vision, propreception, adaptation and response, and the kind of problem he is undertaking. A major deficiency of this book is revealed by comparison of its use of physics and of chemistry to provide understanding and background for the application of high resolution electron microscopy to problems in biology. Since the volume is keyed to high resolution electron microscopy, which is a sophisticated form of structural analysis, but really morphology in a modern guise, the physical and mechanical background of The instrument and its ancillary tools are simply and well presented. The potential use of chemical or cytochemical information as it relates to biological fine structure , however, is quite deficient. I wonder when even sophisticated morphol-ogists will consider fixation a reaction and not a technique; only then will the fundamentals become self-evident and predictable and this sine qua flon will become less mystical. Staining reactions (the most inadequate chapter) ought to be something more than a technique to selectively enhance contrast of morphological elements; it ought to give the structural addresses of some of the chemical residents of cell components. Is it pertinent that auto-radiography gets singled out for more complete coverage than other significant aspects of cytochemistry by a high resolution microscopist, when it has a built-in minimal error of 1,000 A in standard practice? I don't mean to blind-side (in strict football terminology) Dr. Sj6strand's efforts for what is \"routinely used in our laboratory\"; what is done is usually well done. It's just that …

...read moreread less

3,197 citations

Journal Article•

Fast Tree: Computing Large Minimum-Evolution Trees with Profiles instead of a Distance Matrix

[...]

Morgan N. Price, Paramvir S. Dehal, Adam P. Arkin

18 Jun 2009-Lawrence Berkeley National Laboratory

TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.

...read moreread less

Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

...read moreread less

2,436 citations

Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling

[...]

Orly Alter¹, Patrick O. Brown, David Botstein•Institutions (1)

Stanford University¹

01 Mar 2001

TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.

...read moreread less

Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

...read moreread less

1,815 citations

Machine learning with Python

[...]

Pedro Ferreira, Christopher L. Simons

25 Apr 2017

TL;DR: This presentation is a case study taken from the travel and holiday industry and describes the effectiveness of various techniques as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib).

...read moreread less

Abstract: This presentation is a case study taken from the travel and holiday industry. Paxport/Multicom, based in UK and Sweden, have recently adopted a recommendation system for holiday accommodation bookings. Machine learning techniques such as Collaborative Filtering have been applied using Python (3.5.1), with Jupyter (4.0.6) as the main framework. Data scale and sparsity present significant challenges in the case study, and so the effectiveness of various techniques are described as well as the performance of Python-based libraries such as Python Data Analysis Library (Pandas), and Scikit-learn (built on NumPy, SciPy and matplotlib). The presentation is suitable for all levels of programmers.

...read moreread less

1,338 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse