scispace - formally typeset
Search or ask a question

Showing papers in "Nature Methods in 2012"


Journal ArticleDOI
TL;DR: The origins, challenges and solutions of NIH Image and ImageJ software are discussed, and how their history can serve to advise and inform other software projects.
Abstract: For the past 25 years NIH Image and ImageJ software have been pioneers as open tools for the analysis of scientific images. We discuss the origins, challenges and solutions of these two programs, and how their history can serve to advise and inform other software projects.

44,587 citations


Journal ArticleDOI
TL;DR: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis that facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system.
Abstract: Fiji is a distribution of the popular open-source software ImageJ focused on biological-image analysis. Fiji uses modern software engineering practices to combine powerful software libraries with a broad range of scripting languages to enable rapid prototyping of image-processing algorithms. Fiji facilitates the transformation of new algorithms into ImageJ plugins that can be shared with end users through an integrated update system. We propose Fiji as a platform for productive collaboration between computer science and biology research communities.

43,540 citations


Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations


Journal ArticleDOI
TL;DR: jModelTest 2: more models, new heuristics and parallel computing Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada.
Abstract: jModelTest 2: more models, new heuristics and parallel computing Diego Darriba, Guillermo L. Taboada, Ramón Doallo and David Posada Supplementary Table 1. New features in jModelTest 2 Supplementary Table 2. Model selection accuracy Supplementary Table 3. Mean square errors for model averaged estimates Supplementary Note 1. Hill-climbing hierarchical clustering algorithm Supplementary Note 2. Heuristic filtering Supplementary Note 3. Simulations from prior distributions Supplementary Note 4. Speed-up benchmark on real and simulated datasets

13,100 citations


Journal ArticleDOI
TL;DR: ChromHMM as mentioned in this paper is an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations.
Abstract: Chromatin state annotation using combinations of chromatin modification patterns has emerged as a powerful approach for discovering regulatory regions and their cell type specific activity patterns, and for interpreting disease-association studies1-5 However, the computational challenge of learning chromatin state models from large numbers of chromatin modification datasets in multiple cell types still requires extensive bioinformatics expertise making it inaccessible to the wider scientific community To address this challenge, we have developed ChromHMM, an automated computational system for learning chromatin states, characterizing their biological functions and correlations with large-scale functional datasets, and visualizing the resulting genome-wide maps of chromatin state annotations

2,134 citations


Journal ArticleDOI
TL;DR: An open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM–based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/).
Abstract: Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.

1,865 citations


Journal ArticleDOI
TL;DR: A method for estimating haplotypes, using genotype data from unrelated samples or small nuclear families, that leads to improved accuracy and speed compared to several widely used methods is presented.
Abstract: An efficient haplotype-estimation algorithm that features linear complexity allows the rapid and accurate phasing of diploid genomes from trios, duos and unrelated samples.

1,710 citations


Journal ArticleDOI
TL;DR: This work presents an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches, and validated the metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads.
Abstract: Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large data sets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated our metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads and provide the largest metagenomic profiling to date of the human gut. It can be accessed at http://huttenhower.sph.harvard.edu/metaphlan/.

1,566 citations


Journal ArticleDOI
TL;DR: In this paper, the authors performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data.
Abstract: Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising ~1,700 transcriptional interactions at a precision of ~50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

1,424 citations


Journal ArticleDOI
TL;DR: Icy is a collaborative bioimage informatics platform that combines a community website for contributing and sharing tools and material, and software with a high-end visual programming framework for seamless development of sophisticated imaging workflows.
Abstract: Icy is a collaborative platform for biological image analysis that extends reproducible research principles by facilitating and stimulating the contribution and sharing of algorithm-based tools and protocols between researchers. Current research in biology uses evermore complex computational and imaging tools. Here we describe Icy, a collaborative bioimage informatics platform that combines a community website for contributing and sharing tools and material, and software with a high-end visual programming framework for seamless development of sophisticated imaging workflows. Icy extends the reproducible research principles, by encouraging and facilitating the reusability, modularity, standardization and management of algorithms and protocols. Icy is free, open-source and available at http://icy.bioimageanalysis.org/ .

1,261 citations


Journal ArticleDOI
TL;DR: A travel guide to the world of plugins, covering the 152 publicly available plugins for Cytoscape 2.5–2.8 and ongoing efforts to distribute, organize and maintain the quality of the collection.
Abstract: Cytoscape is open-source software for integration, visualization and analysis of biological networks. It can be extended through Cytoscape plugins, enabling a broad community of scientists to contribute useful features. This growth has occurred organically through the independent efforts of diverse authors, yielding a powerful but heterogeneous set of tools. We present a travel guide to the world of plugins, covering the 152 publicly available plugins for Cytoscape 2.5-2.8. We also describe ongoing efforts to distribute, organize and maintain the quality of the collection.

Journal ArticleDOI
TL;DR: In this article, the authors present a pipeline that integrates a strategy for mapping of sequencing reads and a data-driven method for iterative correction of biases, yielding genome-wide maps of relative contact probabilities.
Abstract: Extracting biologically meaningful information from chromosomal interactions obtained with genome-wide chromosome conformation capture (3C) analyses requires elimination of systematic biases. We present a pipeline that integrates a strategy for mapping of sequencing reads and a data-driven method for iterative correction of biases, yielding genome-wide maps of relative contact probabilities. We validate ICE (Iterative Correction and Eigenvector decomposition) on published Hi-C data, and demonstrate that eigenvector decomposition of the obtained maps provides insights into local chromatin states, global patterns of chromosomal interactions, and the conserved organization of human and mouse chromosomes.

Journal ArticleDOI
TL;DR: How SRM is applied in proteomics is described, recent advances are reviewed, present selected applications and a perspective on the future of this powerful technology is provided.
Abstract: Selected reaction monitoring (SRM) is a targeted mass spectrometry technique that is emerging in the field of proteomics as a complement to untargeted shotgun methods. SRM is particularly useful when predetermined sets of proteins, such as those constituting cellular networks or sets of candidate biomarkers, need to be measured across multiple samples in a consistent, reproducible and quantitatively precise manner. Here we describe how SRM is applied in proteomics, review recent advances, present selected applications and provide a perspective on the future of this powerful technology.

Journal ArticleDOI
TL;DR: Analysis of simulated data with realistic signal-to-noise ratios indicates that the accuracy of the orientation determination is not affected by the exclusion of high-frequency terms, nor by the use of a model that is reconstructed from only half of the particles, as expected.
Abstract: In the field of single-particle analysis of electron cryo-microscopy (cryo-EM) data, a growing concern that some resolution claims might not be substantiated by the data has been one of the instigators of community-wide efforts to develop new validation tools1. A known issue with commonly used cryo-EM structure determination procedures is their liability to overfit the data. Most procedures counter overfitting by low-pass filtering, but the effective frequencies for these filters are often based on suboptimal Fourier Shell Correlation2 (FSC) procedures. In the suboptimal procedure, FSC curves are calculated between reconstructions from two halves of the data, while a single model is used to determine the relative orientations of all particles. It is well known that bias towards noise in the single model may inflate the resulting resolution estimates. To illustrate this, we applied the suboptimal procedure to a simulated cryo-EM data set of 20,212 GroEL particles. Whereas the reported resolution was 4.6 A, the true resolution of the map was only 7.8 A. Also the presence of expected density features in the map does not necessarily provide sufficient evidence for a resolution claim: we could make convincingly looking figures of apparent side-chain density that in reality corresponded to overfitted noise (Supplementary Figure 1). Consequently, overfitting may remain undetected and interpretation of cryo-EM maps may be subject to errors. The dangers of overfitting have been recognized, and refinement procedures with resolution-dependent weighting schemes to reduce overfitting have been proposed3,4. However, two known solutions to prevent it are not in common use. By refining two models independently (one for each half of the data), so-called gold-standard1 FSC curves may be calculated that are free from spurious correlations. Alternatively, the data used for the orientation determination may be limited to a user-specified frequency, so that model bias beyond that frequency may be avoided. However, the argument that withholding part of the data from the refinement would substantially deteriorate the orientations and thereby the quality of the structure has prevented the wide-spread use of either of these solutions. In what follows, we prove this thesis to be false. Analysis of simulated data with realistic signal-to-noise ratios (SNRs) indicates that the accuracy of the orientation determination is not affected by the exclusion of high-frequency terms, nor by the use of a model that is reconstructed from only half of the particles (Supplementary Figure 2). These simulations illustrate that only the low-medium frequency terms in the individual particles contain sufficiently high SNRs to contribute significantly to the orientation determination, which is in good agreement with experimental evidence that cryo-EM particles may be aligned accurately using only low-frequency data5. Because in most cryo-EM studies the low-medium frequencies of reconstructions from half of the particles are not expected to be significantly worse than those of reconstructions from all particles, we hypothesize that overfitting may be prevented without a notable loss of resolution using either frequency-limited refinement or refinement based on gold-standard FSCs. Since the former involves a decision by the user, i.e. choosing the frequency at which to limit the refinement, we favour gold-standard FSCs and implemented a procedure to independently refine two models as a script on top of the conventional projection matching protocol in the XMIPP package6 (Supplementary Figure 3 & Supplementary Software). We tested our hypothesis using three cryo-EM data sets: 5,053 GroEL particles that are distributed by the National Center for Macromolecular Imaging; an in-house collected data set of 50,330 β-galactosidase particles (Supplementary Methods); and 5,403 hepatitis B capsid particles from a previously published study7. High-resolution crystal structures are available for all three data sets, and these were used to assess the “true” resolution obtained using refinements based on either gold-standard or conventional FSC procedures (Figure 1). For all three cases, the conventional procedure reported apparently better FSC curves than the gold-standard procedure, but in no case did the gold-standard procedure actually result in a lower resolution map compared to the crystal structure. On the contrary, for the β-galactosidase data the gold-standard procedure yielded a structure that correlated up to higher frequencies with the crystal structure than the conventional procedure, which suffered from severe overfitting and gave rise to strong artefacts in the map. We also note that, in the absence of overfitting, the frequency at which the gold-standard FSC drops below 0.143 is a good indicator of the true resolution of the map (Supplementary Table 1), which is as expected from theory8. Finally, in the limit of very small data sets, division of the data into two halves might affect resolution. However, calculations with subsets of the GroEL particles suggest that this only becomes an issue for data sets that are much smaller than those typically used in cryo-EM reconstructions (Supplementary Figure 4). Figure 1 The prevention of overfitting The principal conclusion is therefore that overfitting of noise using suboptimal FSCs causes worse orientations and leads to a worse structure. In contrast, the use of gold-standard FSCs provides a realistic estimate of the true signal, which ultimately leads to a better map. The procedures proposed here are straightforward to implement in existing programs, and their application will eradicate the hazards of overfitting from cryo-EM structure determination procedures.

Journal ArticleDOI
TL;DR: ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence catalog and complexes derived from the Saccharomyces Genome Database than the results of seven popular methods.
Abstract: We introduce clustering with overlapping neighborhood expansion (ClusterONE), a method for detecting potentially overlapping protein complexes from protein-protein interaction data. ClusterONE-derived complexes for several yeast data sets showed better correspondence with reference complexes in the Munich Information Center for Protein Sequence (MIPS) catalog and complexes derived from the Saccharomyces Genome Database (SGD) than the results of seven popular methods. The results also showed a high extent of functional homogeneity.

Journal ArticleDOI
TL;DR: Unique molecular identifiers (UMIs), which make each molecule in a population distinct, are applied to genome-scale human karyotyping and mRNA sequencing in Drosophila melanogaster to improve accuracy of almost any next-generation sequencing method.
Abstract: Unique molecular identifiers (UMIs) associate distinct sequences with every DNA or RNA molecule and can be counted after amplification to quantify molecules in the original sample. Using UMIs, the authors obtain a digital karyotype of an individual with Down's syndrome and quantify mRNA in Drosophila melanogaster cells.

Journal ArticleDOI
TL;DR: This work systematically compared microbial opsins under matched experimental conditions to extract essential principles and identify key parameters for the conduct, design and interpretation of experiments involving optogenetic techniques.
Abstract: Diverse optogenetic tools have allowed versatile control over neural activity. Many depolarizing and hyperpolarizing tools have now been developed in multiple laboratories and tested across different preparations, presenting opportunities but also making it difficult to draw direct comparisons. This challenge has been compounded by the dependence of performance on parameters such as vector, promoter, expression time, illumination, cell type and many other variables. As a result, it has become increasingly complicated for end users to select the optimal reagents for their experimental needs. For a rapidly growing field, critical figures of merit should be formalized both to establish a framework for further development and so that end users can readily understand how these standardized parameters translate into performance. Here we systematically compared microbial opsins under matched experimental conditions to extract essential principles and identify key parameters for the conduct, design and interpretation of experiments involving optogenetic techniques.

Journal ArticleDOI
TL;DR: Replacement of CFP and YFP with these two proteins in reporters of kinase activity, small GTPase activity and transmembrane voltage significantly improves photostability, FRET dynamic range and emission ratio changes and enhances detection of transient biochemical events.
Abstract: A variety of genetically encoded reporters use changes in fluorescence (or Forster) resonance energy transfer (FRET) to report on biochemical processes in living cells. The standard genetically encoded FRET pair consists of CFPs and YFPs, but many CFP-YFP reporters suffer from low FRET dynamic range, phototoxicity from the CFP excitation light and complex photokinetic events such as reversible photobleaching and photoconversion. We engineered two fluorescent proteins, Clover and mRuby2, which are the brightest green and red fluorescent proteins to date and have the highest Forster radius of any ratiometric FRET pair yet described. Replacement of CFP and YFP with these two proteins in reporters of kinase activity, small GTPase activity and transmembrane voltage significantly improves photostability, FRET dynamic range and emission ratio changes. These improvements enhance detection of transient biochemical events such as neuronal action-potential firing and RhoA activation in growth cones.

Journal ArticleDOI
TL;DR: A consolidated view of the complexity and challenges of designing studies for measurement of energy metabolism in mouse models is presented, including a practical guide to the assessment of energy expenditure, energy intake and body composition and statistical analysis thereof.
Abstract: We present a consolidated view of the complexity and challenges of designing studies for measurement of energy metabolism in mouse models, including a practical guide to the assessment of energy expenditure, energy intake and body composition and statistical analysis thereof. We hope this guide will facilitate comparisons across studies and minimize spurious interpretations of data. We recommend that division of energy expenditure data by either body weight or lean body weight and that presentation of group effects as histograms should be replaced by plotting individual data and analyzing both group and body-composition effects using analysis of covariance (ANCOVA).

Journal ArticleDOI
TL;DR: An automated method is described that achieves high-throughput fluorescence imaging of mouse brains by integrating two-photon microscopy and tissue sectioning, which opens the door to routine systematic studies of neuroanatomy in mouse models of human brain disorders.
Abstract: Here we describe an automated method, named serial two-photon (STP) tomography, that achieves high-throughput fluorescence imaging of mouse brains by integrating two-photon microscopy and tissue sectioning. STP tomography generates high-resolution datasets that are free of distortions and can be readily warped in three dimensions, for example, for comparing multiple anatomical tracings. This method opens the door to routine systematic studies of neuroanatomy in mouse models of human brain disorders.

Journal ArticleDOI
TL;DR: The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision and speed.
Abstract: Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy. The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).

Journal ArticleDOI
TL;DR: 2b-RAD, a streamlined restriction site–associated DNA (RAD) genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases, is described.
Abstract: Genotyping based on restriction site7ndash;associated (RAD) sequencing around type IIB enzyme recognition sites is reported. The streamlined reduced-representation approach features even and tunable genome coverage and enables large-scale genotyping studies by maximizing the amount of genotypic information that can be obtained from individuals for a given amount of sequencing. We describe 2b-RAD, a streamlined restriction site–associated DNA (RAD) genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases. Well-studied accessions of Arabidopsis thaliana were genotyped to validate the method's accuracy and to demonstrate fine-tuning of marker density as needed. The simplicity of the 2b-RAD protocol makes it particularly suitable for high-throughput genotyping as required for linkage mapping and profiling genetic variation in natural populations.

Journal ArticleDOI
Jonas Ries1, Charlotte Kaplan1, Evgenia Platonova1, Hadi Eghlidi1, Helge Ewers1 
TL;DR: This work developed a method to use any GFP-tagged construct in single-molecule super-resolution microscopy by targeting GFP with small, high-affinity antibodies coupled to organic dyes and achieved nanometer spatial resolution and minimal linkage error when analyzing microtubules, living neurons and yeast cells.
Abstract: We developed a method to use any GFP-tagged construct in single-molecule super-resolution microscopy By targeting GFP with small, high-affinity antibodies coupled to organic dyes, we achieved nanometer spatial resolution and minimal linkage error when analyzing microtubules, living neurons and yeast cells We show that in combination with libraries encoding GFP-tagged proteins, virtually any known protein can immediately be used in super-resolution microscopy and that simplified labeling schemes allow high-throughput super-resolution imaging

Journal ArticleDOI
TL;DR: pLink as mentioned in this paper is a software for data analysis of cross-linked proteins coupled with mass-spectrometry analysis, which is compatible with multiple homo- or hetero-bifunctional cross-linkers.
Abstract: pLink, software for data analysis of cross-linked proteins coupled with mass spectrometry, estimates false discovery rate and enables analysis of protein complexes without extensive purification. We have developed pLink, software for data analysis of cross-linked proteins coupled with mass-spectrometry analysis. pLink reliably estimates false discovery rate in cross-link identification and is compatible with multiple homo- or hetero-bifunctional cross-linkers. We validated the program with proteins of known structures, and we further tested it on protein complexes, crude immunoprecipitates and whole-cell lysates. We show that it is a robust tool for protein-structure and protein-protein–interaction studies.

Journal ArticleDOI
TL;DR: This work developed one-photon and multiphoton SiMView implementations and recorded cellular dynamics in entire Drosophila melanogaster embryos with 30-s temporal resolution throughout development and performed high-resolution long-term imaging of the developing nervous system and followed neuroblast cell lineages in vivo.
Abstract: Simultaneous multiview light-sheet microscopy using two illumination and two detection arms with one- or two-photon illumination is coupled to a fast data acquisition framework and analysis pipeline for quantitative imaging and tracking of individual cells and the developing nervous system throughout a living fly embryo. A related paper by Krzic et al. is also in this issue.

Journal ArticleDOI
TL;DR: Each computational step that biologists encounter when dealing with digital images, the inherent challenges and the overall status of available software for bioimage informatics are reviewed, focusing on open-source options.
Abstract: Representative members of the bioimage informatics community review the computational steps and some of the primary software tools available to biologists who are acquiring and analyzing microscopy-based digital image data, with a focus on open-source options. Few technologies are more widespread in modern biological laboratories than imaging. Recent advances in optical technologies and instrumentation are providing hitherto unimagined capabilities. Almost all these advances have required the development of software to enable the acquisition, management, analysis and visualization of the imaging data. We review each computational step that biologists encounter when dealing with digital images, the inherent challenges and the overall status of available software for bioimage informatics, focusing on open-source options.

Journal ArticleDOI
TL;DR: A sparse-signal recovery technique using compressed sensing to analyze images with highly overlapping fluorescent spots that allows an activated fluorophore density an order of magnitude higher than what conventional single-molecule fitting methods can handle.
Abstract: In super-resolution microscopy methods based on single-molecule switching, the rate of accumulating single-molecule activation events often limits the time resolution. Here we developed a sparse-signal recovery technique using compressed sensing to analyze images with highly overlapping fluorescent spots. This method allows an activated fluorophore density an order of magnitude higher than what conventional single-molecule fitting methods can handle. Using this method, we demonstrated imaging microtubule dynamics in living cells with a time resolution of 3 s.

Journal ArticleDOI
TL;DR: The International Molecular Exchange consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website.
Abstract: The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.

Journal ArticleDOI
TL;DR: Unique features of lens-free computational imaging tools are discussed and some of their emerging results for wide-field on-chip microscopy, such as the achievement of a numerical aperture of ∼0.8–0.9 across a field of view (FOV) of more than 20 mm2, which corresponds to an image with more than 1.5 gigapixels.
Abstract: In this perspective, the authors present the basic features of lens-free computational imaging tools and report performance comparisons with conventional microscopy methods. They also discuss the challenges that these computational on-chip microscopes face for their wide-scale biomedical application.

Journal ArticleDOI
TL;DR: This work describes canonical ways to measure an algorithm’s performance so that algorithms can be compared against each other fairly, and provides an optional framework to do so conveniently within CellProfiler.
Abstract: as a resource for testing and validating automated image-analysis algorithms. The BBBC is particularly useful for high-throughput experiments and for providing biological ground truth for evaluating image-analysis algorithms. If an algorithm is sufficiently robust across samples to handle high-throughput experiments, lowthoughput applications also benefit because tolerance to variability in sample preparation and imaging makes the algorithm more likely to generalize to new image sets. Each image set in the BBBC is accompanied by a brief description of its motivating biological application and a set of groundtruth data against which algorithms can be evaluated. The ground truth sets can consist of cell or nucleus counts, foreground and background pixels, outlines of individual objects, or biological labels based on treatment conditions or orthogonal assays (such as a dose-response curve or positiveand negative-control images). We describe canonical ways to measure an algorithm’s performance so that algorithms can be compared against each other fairly, and we provide an optional framework to do so conveniently within CellProfiler. For each image set, we list any published results of which we are aware. The BBBC is freely available from http://www.broadinstitute. org/bbbc/. The collection currently contains 18 image sets, including images of cells (Homo sapiens and Drosophila melanogaster) as well as of whole organisms (Caenorhabditis elegans) assayed in high throughput. We are continuing to extend the collection during the course of our research, and we encourage the submission of additional image sets, ground truth and published results of algorithms.