Author
Marc R. J. Carlson
Other affiliations: University of California, Seattle Children's Research Institute, University of California, Los Angeles ...read more
Bio: Marc R. J. Carlson is an academic researcher from Fred Hutchinson Cancer Research Center. The author has contributed to research in topics: Bioconductor & Regeneration (biology). The author has an hindex of 19, co-authored 20 publications receiving 7458 citations. Previous affiliations of Marc R. J. Carlson include University of California & Seattle Children's Research Institute.
Papers
More filters
••
TL;DR: This work describes Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions, including those for sequence analysis, differential expression analysis and visualization.
Abstract: We describe Bioconductor infrastructure for representing and computing on annotated genomic ranges and integrating genomic data with the statistical computing features of R and its extensions. At the core of the infrastructure are three packages: IRanges, GenomicRanges, and GenomicFeatures. These packages provide scalable data structures for representing annotated ranges on the genome, with special support for transcript structures, read alignments and coverage vectors. Computational facilities include efficient algorithms for overlap and nearest neighbor detection, coverage calculation and other range operations. This infrastructure directly supports more than 80 other Bioconductor packages, including those for sequence analysis, differential expression analysis and visualization.
3,005 citations
••
Harvard University1, Genentech2, Fred Hutchinson Cancer Research Center3, State University of Campinas4, University of Maryland, College Park5, National Institutes of Health6, University of Cambridge7, University of California, Riverside8, Novartis9, Johns Hopkins University10, University of Washington11, Walter and Eliza Hall Institute of Medical Research12, City University of New York13
TL;DR: An overview of Bioconductor, an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology, which comprises 934 interoperable packages contributed by a large, diverse community of scientists.
Abstract: Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
2,818 citations
••
TL;DR: The weighted gene coexpression network analysis provides a blueprint for leveraging genomic data to identify key control networks and molecular targets for glioblastoma, and the principle eluted from this work can be applied to other cancers.
Abstract: Glioblastoma is the most common primary malignant brain tumor of adults and one of the most lethal of all cancers. Patients with this disease have a median survival of 15 months from the time of diagnosis despite surgery, radiation, and chemotherapy. New treatment approaches are needed. Recent works suggest that glioblastoma patients may benefit from molecularly targeted therapies. Here, we address the compelling need for identification of new molecular targets. Leveraging global gene expression data from two independent sets of clinical tumor samples (n = 55 and n = 65), we identify a gene coexpression module in glioblastoma that is also present in breast cancer and significantly overlaps with the "metasignature" for undifferentiated cancer. Studies in an isogenic model system demonstrate that this module is downstream of the mutant epidermal growth factor receptor, EGFRvIII, and that it can be inhibited by the epidermal growth factor receptor tyrosine kinase inhibitor Erlotinib. We identify ASPM (abnormal spindle-like microcephaly associated) as a key gene within this module and demonstrate its overexpression in glioblastoma relative to normal brain (or body tissues). Finally, we show that ASPM inhibition by siRNA-mediated knockdown inhibits tumor cell proliferation and neural stem cell proliferation, supporting ASPM as a potential molecular target in glioblastoma. Our weighted gene coexpression network analysis provides a blueprint for leveraging genomic data to identify key control networks and molecular targets for glioblastoma, and the principle eluted from our work can be applied to other cancers.
595 citations
••
TL;DR: The results indicate that Treg cell effector function but not lineage commitment requires the expression of functional Foxp3 protein.
Abstract: Although the development of regulatory T cells (T(reg) cells) in the thymus is defined by expression of the lineage marker Foxp3, the precise function of Foxp3 in T(reg) cell lineage commitment is unknown. Here we examined T(reg) cell development and function in mice with a Foxp3 allele that directs expression of a nonfunctional fusion protein of Foxp3 and enhanced green fluorescent protein (Foxp3DeltaEGFP). Thymocyte development in Foxp3DeltaEGFP male mice and Foxp3DeltaEGFP/+ female mice recapitulated that of wild-type mice. Although mature EGFP(+) CD4(+) T cells from Foxp3DeltaEGFP mice lacked suppressor function, they maintained the characteristic T(reg) cell 'genetic signature' and failed to develop from EGFP(-) CD4(+) T cells when transferred into lymphopenic hosts, indicative of their common ontogeny with T(reg) cells. Our results indicate that T(reg) cell effector function but not lineage commitment requires the expression of functional Foxp3 protein.
456 citations
••
Saarland University1, University of Tübingen2, University of Kiel3, Life Technologies4, Stanford University5, University of Pavia6, Centre national de la recherche scientifique7, University of Tartu8, Sorenson Molecular Genealogy Foundation9, University of La Laguna10, Heidelberg University11, Ontario Institute for Cancer Research12, Fred Hutchinson Cancer Research Center13, Wrocław Medical University14
TL;DR: The complete genome sequence of the Iceman is reported and 100% concordance between the previously reported mitochondrial genome sequence and the consensus sequence generated from the genomic data is shown.
Abstract: The Tyrolean Iceman, a 5,300-year-old Copper age individual, was discovered in 1991 on the Tisenjoch Pass in the Italian part of the Otztal Alps. Here we report the complete genome sequence of the Iceman and show 100% concordance between the previously reported mitochondrial genome sequence and the consensus sequence generated from our genomic data. We present indications for recent common ancestry between the Iceman and present-day inhabitants of the Tyrrhenian Sea, that the Iceman probably had brown eyes, belonged to blood group O and was lactose intolerant. His genetic predisposition shows an increased risk for coronary heart disease and may have contributed to the development of previously reported vascular calcifications. Sequences corresponding to ~60% of the genome of Borrelia burgdorferi are indicative of the earliest human case of infection with the pathogen for Lyme borreliosis.
413 citations
Cited by
More filters
••
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html
.
47,038 citations
••
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.
17,014 citations
••
TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de
15,744 citations
••
TL;DR: The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis that includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software.
Abstract: Correlation networks are increasingly being used in bioinformatics applications For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets These methods have been successfully applied in various biological contexts, eg cancer, mouse genetics, yeast genetics, and analysis of brain imaging data While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software Along with the R package we also present R software tutorials While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings The WGCNA package provides R functions for weighted correlation network analysis, eg co-expression network analysis of gene expression data The R package along with its source code and additional material are freely available at http://wwwgeneticsuclaedu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA
14,243 citations
••
Northern Arizona University1, National Institutes of Health2, University of Minnesota3, University of California, Davis4, Woods Hole Oceanographic Institution5, Massachusetts Institute of Technology6, University of Copenhagen7, University of Trento8, Chinese Academy of Sciences9, University of California, San Francisco10, University of Pennsylvania11, Pacific Northwest National Laboratory12, North Carolina State University13, University of California, San Diego14, Institute for Systems Biology15, Dalhousie University16, University of British Columbia17, Statens Serum Institut18, Anschutz Medical Campus19, University of Washington20, Michigan State University21, Stanford University22, Harvard University23, Broad Institute24, Australian National University25, University of Düsseldorf26, University of New South Wales27, Sookmyung Women's University28, San Diego State University29, Howard Hughes Medical Institute30, Max Planck Society31, Cornell University32, Colorado State University33, Google34, Syracuse University35, Webster University36, United States Department of Agriculture37, University of Arkansas for Medical Sciences38, Colorado School of Mines39, University of Southern Mississippi40, National Oceanic and Atmospheric Administration41, University of California, Merced42, Wageningen University and Research Centre43, University of Arizona44, Environment Agency45, University of Florida46, Merck & Co.47
TL;DR: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and R.K.P. and partial support was also provided by the following: grants NIH U54CA143925 and U54MD012388.
Abstract: QIIME 2 development was primarily funded by NSF Awards 1565100 to J.G.C. and 1565057 to R.K. Partial support was also provided by the following: grants NIH U54CA143925 (J.G.C. and T.P.) and U54MD012388 (J.G.C. and T.P.); grants from the Alfred P. Sloan Foundation (J.G.C. and R.K.); ERCSTG project MetaPG (N.S.); the Strategic Priority Research Program of the Chinese Academy of Sciences QYZDB-SSW-SMC021 (Y.B.); the Australian National Health and Medical Research Council APP1085372 (G.A.H., J.G.C., Von Bing Yap and R.K.); the Natural Sciences and Engineering Research Council (NSERC) to D.L.G.; and the State of Arizona Technology and Research Initiative Fund (TRIF), administered by the Arizona Board of Regents, through Northern Arizona University. All NCI coauthors were supported by the Intramural Research Program of the National Cancer Institute. S.M.G. and C. Diener were supported by the Washington Research Foundation Distinguished Investigator Award.
8,821 citations