scispace - formally typeset
Search or ask a question

Showing papers in "Nature Methods in 2020"


Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open-source scientific computing library for the Python programming language, which has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year.
Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

6,244 citations


Journal ArticleDOI
TL;DR: A long-read assembler wtdbg2 is developed that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy, and is several times faster, especially for large genomes.
Abstract: Existing long-read assemblers require thousands of central processing unit hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2–17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future. Wtdbg2 assembles genomes with comparable contiguity and accuracy to existing tools using long-read sequencing data, and is several times faster, especially for large genomes.

783 citations


Journal ArticleDOI
TL;DR: NicheNet is presented, a method that predicts ligand–target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks, and can infer active ligands and their gene regulatory effects on interacting cells.
Abstract: Computational methods that model how gene expression of a cell is influenced by interacting cells are lacking. We present NicheNet (https://github.com/saeyslab/nichenetr), a method that predicts ligand-target links between interacting cells by combining their expression data with prior knowledge on signaling and gene regulatory networks. We applied NicheNet to tumor and immune cell microenvironment data and demonstrate that NicheNet can infer active ligands and their gene regulatory effects on interacting cells.

681 citations


Journal ArticleDOI
TL;DR: Non-uniform refinement, an algorithm based on cross-validation optimization, is introduced, which automatically regularizes 3D density maps during refinement to account for spatial variability and yields dramatically improved resolution and 3D map quality in many cases.
Abstract: Cryogenic electron microscopy (cryo-EM) is widely used to study biological macromolecules that comprise regions with disorder, flexibility or partial occupancy. For example, membrane proteins are often kept in solution with detergent micelles and lipid nanodiscs that are locally disordered. Such spatial variability negatively impacts computational three-dimensional (3D) reconstruction with existing iterative refinement algorithms that assume rigidity. We introduce non-uniform refinement, an algorithm based on cross-validation optimization, which automatically regularizes 3D density maps during refinement to account for spatial variability. Unlike common shift-invariant regularizers, non-uniform refinement systematically removes noise from disordered regions, while retaining signal useful for aligning particle images, yielding dramatically improved resolution and 3D map quality in many cases. We obtain high-resolution reconstructions for multiple membrane proteins as small as 100 kDa, demonstrating increased effectiveness of cryo-EM for this class of targets critical in structural biology and drug discovery. Non-uniform refinement is implemented in the cryoSPARC software package. Membrane proteins exhibit spatial variation in rigidity and disorder, which poses a challenge for traditional cryo-EM reconstruction algorithms. Non-uniform refinement accounts for this spatial variability, yielding improved 3D reconstruction quality even for small membrane proteins.

620 citations


Journal ArticleDOI
TL;DR: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.

617 citations


Journal ArticleDOI
TL;DR: DIA-NN improves the identification and quantification performance in conventional DIA proteomic applications, and is particularly beneficial for high-throughput applications, as it is fast and enables deep and confident proteome coverage when used in combination with fast chromatographic methods.
Abstract: We present an easy-to-use integrated software suite, DIA-NN, that exploits deep neural networks and new quantification and signal correction strategies for the processing of data-independent acquisition (DIA) proteomics experiments. DIA-NN improves the identification and quantification performance in conventional DIA proteomic applications, and is particularly beneficial for high-throughput applications, as it is fast and enables deep and confident proteome coverage when used in combination with fast chromatographic methods.

584 citations


Journal ArticleDOI
Louis-Félix Nothias1, Louis-Félix Nothias2, Daniel Petras1, Daniel Petras2, Robin Schmid3, Kai Dührkop4, Johannes Rainer5, Abinesh Sarvepalli2, Abinesh Sarvepalli1, Ivan Protsyuk, Madeleine Ernst6, Madeleine Ernst2, Madeleine Ernst1, Hiroshi Tsugawa, Markus Fleischauer4, Fabian Aicheler7, Alexander A. Aksenov1, Alexander A. Aksenov2, Oliver Alka7, Pierre-Marie Allard8, Aiko Barsch9, Xavier Cachet10, Andrés Mauricio Caraballo-Rodríguez1, Andrés Mauricio Caraballo-Rodríguez2, Ricardo Silva11, Ricardo Silva2, Tam Dang12, Tam Dang2, Neha Garg13, Julia M. Gauglitz1, Julia M. Gauglitz2, Alexey Gurevich14, Giorgis Isaac15, Alan K. Jarmusch2, Alan K. Jarmusch1, Zdeněk Kameník16, Kyo Bin Kang1, Kyo Bin Kang2, Kyo Bin Kang17, Nikolas Kessler9, Irina Koester1, Irina Koester2, Ansgar Korf3, Audrey Le Gouellec18, Marcus Ludwig4, Christian Martin H, Laura-Isobel McCall19, Jonathan McSayles, Sven W. Meyer9, Hosein Mohimani20, Mustafa Morsy21, Oriane Moyne18, Oriane Moyne2, Steffen Neumann22, Heiko Neuweger9, Ngoc Hung Nguyen1, Ngoc Hung Nguyen2, Mélissa Nothias-Esposito2, Mélissa Nothias-Esposito1, Julien Paolini23, Vanessa V. Phelan1, Tomáš Pluskal24, Robert A. Quinn25, Simon Rogers26, Bindesh Shrestha15, Anupriya Tripathi2, Anupriya Tripathi1, Justin J. J. van der Hooft2, Justin J. J. van der Hooft1, Justin J. J. van der Hooft27, Fernando Vargas1, Fernando Vargas2, Kelly C. Weldon2, Kelly C. Weldon1, Michael Witting, Heejung Yang28, Zheng Zhang2, Zheng Zhang1, Florian Zubeil9, Oliver Kohlbacher, Sebastian Böcker4, Theodore Alexandrov1, Theodore Alexandrov2, Nuno Bandeira1, Nuno Bandeira2, Mingxun Wang2, Mingxun Wang1, Pieter C. Dorrestein 
TL;DR: Feature-based molecular networking (FBMN) as discussed by the authors is an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools.
Abstract: Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.

497 citations


Journal ArticleDOI
Julia Koehler Leman1, Brian D. Weitzner2, Brian D. Weitzner3, Steven M. Lewis4, Steven M. Lewis5, Jared Adolf-Bryfogle6, Nawsad Alam7, Rebecca F. Alford2, Melanie L. Aprahamian8, David Baker3, Kyle A. Barlow9, Patrick Barth10, Patrick Barth11, Benjamin Basanta3, Brian J. Bender12, Kristin Blacklock13, Jaume Bonet14, Jaume Bonet11, Scott E. Boyken3, Phil Bradley15, Christopher Bystroff16, Patrick Conway3, Seth Cooper17, Bruno E. Correia11, Bruno E. Correia14, Brian Coventry3, Rhiju Das18, René M. de Jong19, Frank DiMaio3, Lorna Dsilva17, Roland L. Dunbrack20, Alex Ford3, Brandon Frenz3, Darwin Y. Fu12, Caleb Geniesse18, Lukasz Goldschmidt3, Ragul Gowthaman21, Jeffrey J. Gray2, Dominik Gront22, Sharon L. Guffy4, Scott Horowitz23, Po-Ssu Huang3, Thomas Huber24, Timothy M. Jacobs4, Jeliazko R. Jeliazkov2, David K. Johnson25, Kalli Kappel18, John Karanicolas20, Hamed Khakzad26, Hamed Khakzad14, Karen R. Khar25, Sagar D. Khare13, Firas Khatib27, Alisa Khramushin7, Indigo Chris King3, Robert Kleffner17, Brian Koepnick3, Tanja Kortemme9, Georg Kuenze12, Brian Kuhlman4, Daisuke Kuroda28, Jason W. Labonte2, Jason W. Labonte29, Jason K. Lai10, Gideon Lapidoth30, Andrew Leaver-Fay4, Steffen Lindert8, Thomas W. Linsky3, Nir London7, Joseph H. Lubin2, Sergey Lyskov2, Jack Maguire4, Lars Malmström14, Lars Malmström26, Lars Malmström31, Enrique Marcos3, Orly Marcu7, Nicholas A. Marze2, Jens Meiler12, Rocco Moretti12, Vikram Khipple Mulligan3, Santrupti Nerli32, Christoffer Norn30, Shane O’Conchúir9, Noah Ollikainen9, Sergey Ovchinnikov3, Michael S. Pacella2, Xingjie Pan9, Hahnbeom Park3, Ryan E. Pavlovicz3, Manasi A. Pethe13, Brian G. Pierce21, Kala Bharath Pilla24, Barak Raveh7, P. Douglas Renfrew, Shourya S. Roy Burman2, Aliza B. Rubenstein13, Marion F. Sauer12, Andreas Scheck11, Andreas Scheck14, William R. Schief6, Ora Schueler-Furman7, Yuval Sedan7, Alexander M. Sevy12, Nikolaos G. Sgourakis32, Lei Shi3, Justin B. Siegel33, Daniel-Adriano Silva3, Shannon Smith12, Yifan Song3, Amelie Stein9, Maria Szegedy13, Frank D. Teets4, Summer B. Thyme3, Ray Yu-Ruei Wang3, Andrew M. Watkins18, Lior Zimmerman7, Richard Bonneau1 
TL;DR: This Perspective reviews tools developed over the past five years in the Rosetta software, including over 80 methods, and discusses improvements to the score function, user interfaces and usability.
Abstract: The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.

430 citations


Journal ArticleDOI
TL;DR: MaSIF (molecular surface interaction fingerprinting) is presented, a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions that will lead to improvements in the understanding of protein function and design.
Abstract: Predicting interactions between proteins and other biomolecules solely based on structure remains a challenge in biology. A high-level representation of protein structure, the molecular surface, displays patterns of chemical and geometric features that fingerprint a protein's modes of interactions with other biomolecules. We hypothesize that proteins participating in similar interactions may share common fingerprints, independent of their evolutionary history. Fingerprints may be difficult to grasp by visual analysis but could be learned from large-scale datasets. We present MaSIF (molecular surface interaction fingerprinting), a conceptual framework based on a geometric deep learning method to capture fingerprints that are important for specific biomolecular interactions. We showcase MaSIF with three prediction challenges: protein pocket-ligand prediction, protein-protein interaction site prediction and ultrafast scanning of protein surfaces for prediction of protein-protein complexes. We anticipate that our conceptual framework will lead to improvements in our understanding of protein function and design.

389 citations


Journal ArticleDOI
TL;DR: This Perspective highlights open-source software for single-cell analysis released as part of the Bioconductor project, providing an overview for users and developers.
Abstract: Recent technological advancements have enabled the profiling of a large number of genome-wide features in individual cells. However, single-cell data present unique challenges that require the development of specialized methods and software infrastructure to successfully derive biological insights. The Bioconductor project has rapidly grown to meet these demands, hosting community-developed open-source software distributed as R packages. Featuring state-of-the-art computational methods, standardized data infrastructure and interactive data visualization tools, we present an overview and online book (https://osca.bioconductor.org) of single-cell methods for prospective users.

332 citations


Journal ArticleDOI
TL;DR: A systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data finds heterogeneous performance and suggests recommendations to users.
Abstract: We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.

Journal ArticleDOI
TL;DR: MetaFlye is presented, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity, and benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long- read assemblers.
Abstract: Long-read sequencing technologies have substantially improved the assemblies of many isolate bacterial genomes as compared to fragmented short-read assemblies. However, assembling complex metagenomic datasets remains difficult even for state-of-the-art long-read assemblers. Here we present metaFlye, which addresses important long-read metagenomic assembly challenges, such as uneven bacterial composition and intra-species heterogeneity. First, we benchmarked metaFlye using simulated and mock bacterial communities and show that it consistently produces assemblies with better completeness and contiguity than state-of-the-art long-read assemblers. Second, we performed long-read sequencing of the sheep microbiome and applied metaFlye to reconstruct 63 complete or nearly complete bacterial genomes within single contigs. Finally, we show that long-read assembly of human microbiomes enables the discovery of full-length biosynthetic gene clusters that encode biomedically important natural products.

Journal ArticleDOI
TL;DR: It is shown that by localizing individual switchable fluorophores with a probing donut-shaped excitation beam, MINFLUX nanoscopy can provide resolutions in the range of 1 to 3 nm for structures in fixed and living cells.
Abstract: The ultimate goal of biological super-resolution fluorescence microscopy is to provide three-dimensional resolution at the size scale of a fluorescent marker. Here we show that by localizing individual switchable fluorophores with a probing donut-shaped excitation beam, MINFLUX nanoscopy can provide resolutions in the range of 1 to 3 nm for structures in fixed and living cells. This progress has been facilitated by approaching each fluorophore iteratively with the probing-donut minimum, making the resolution essentially uniform and isotropic over scalable fields of view. MINFLUX imaging of nuclear pore complexes of a mammalian cell shows that this true nanometer-scale resolution is obtained in three dimensions and in two color channels. Relying on fewer detected photons than standard camera-based localization, MINFLUX nanoscopy is poised to open a new chapter in the imaging of protein complexes and distributions in fixed and living cells. Advances in MINFLUX nanoscopy enable multicolor imaging over large fields of view, bringing true nanometer-scale fluorescence imaging to labeled structures in fixed and living cells.

Journal ArticleDOI
TL;DR: This work uses the correlation of molecular weight and ion mobility in a trapped ion mobility device to devise a scan mode that samples up to 100% of the peptide precursor ion current in m/z and mobility windows and thereby increase the specificity for precursor identification.
Abstract: Data-independent acquisition modes isolate and concurrently fragment populations of different precursors by cycling through segments of a predefined precursor m/z range. Although these selection windows collectively cover the entire m/z range, overall, only a few per cent of all incoming ions are isolated for mass analysis. Here, we make use of the correlation of molecular weight and ion mobility in a trapped ion mobility device (timsTOF Pro) to devise a scan mode that samples up to 100% of the peptide precursor ion current in m/z and mobility windows. We extend an established targeted data extraction workflow by inclusion of the ion mobility dimension for both signal extraction and scoring and thereby increase the specificity for precursor identification. Data acquired from whole proteome digests and mixed organism samples demonstrate deep proteome coverage and a high degree of reproducibility as well as quantitative accuracy, even from 10 ng sample amounts. diaPASEF makes use of the correlation between the ion mobility and the m/z of peptides to trap and release precursor ions in a TIMS-TOF mass spectrometer for an almost complete sampling of the precursor ion beam with data-independent acquisition.

Journal ArticleDOI
TL;DR: The design principles and advances in fluorescence nanothermometry are introduced, application achievements are highlighted, scenarios that may lead to biased sensing are discussed, and the challenges ahead are analyzed in terms of both fundamental issues and practical implementations.
Abstract: Fluorescent nanothermometers can probe changes in local temperature in living cells and in vivo and reveal fundamental insights into biological properties. This field has attracted global efforts in developing both temperature-responsive materials and detection procedures to achieve sub-degree temperature resolution in biosystems. Recent generations of nanothermometers show superior performance to earlier ones and also offer multifunctionality, enabling state-of-the-art functional imaging with improved spatial, temporal and temperature resolutions for monitoring the metabolism of intracellular organelles and internal organs. Although progress in this field has been rapid, it has not been without controversy, as recent studies have shown possible biased sensing during fluorescence-based detection. Here, we introduce the design principles and advances in fluorescence nanothermometry, highlight application achievements, discuss scenarios that may lead to biased sensing, analyze the challenges ahead in terms of both fundamental issues and practical implementations, and point to new directions for improving this interdisciplinary field.

Journal ArticleDOI
TL;DR: A set of isobaric labeling reagents called TMTpro enables deep quantitative comparisons of proteome measurements across 16 samples, and identifies and dose-stratified staurosporine binding to 228 cellular kinases in just one, 18-h experiment.
Abstract: Isobaric labeling empowers proteome-wide expression measurements simultaneously across multiple samples. Here an expanded set of 16 isobaric reagents based on an isobutyl-proline immonium ion reporter structure (TMTpro) is presented. These reagents have similar characteristics to existing tandem mass tag reagents but with increased fragmentation efficiency and signal. In a proteome-scale example dataset, we compared eight common cell lines with and without Torin1 treatment with three replicates, quantifying more than 8,800 proteins (mean of 7.5 peptides per protein) per replicate with an analysis time of only 1.1 h per proteome. Finally, we modified the thermal stability assay to examine proteome-wide melting shifts after treatment with DMSO, 1 or 20 µM staurosporine with five replicates. This assay identified and dose-stratified staurosporine binding to 228 cellular kinases in just one, 18-h experiment. TMTpro reagents allow complex experimental designs—all with essentially no missing values across the 16 samples and no loss in quantitative integrity. A set of isobaric labeling reagents called TMTpro enables deep quantitative comparisons of proteome measurements across 16 samples.

Journal ArticleDOI
TL;DR: Development of single-cell multimodal omics tools is another major step toward understanding the inner workings of biological systems.
Abstract: Advances in single-cell genomics technologies have enabled investigation of the gene regulation programs of multicellular organisms at unprecedented resolution and scale. Development of single-cell multimodal omics tools is another major step toward understanding the inner workings of biological systems.

Journal ArticleDOI
TL;DR: Analyzing four published spatially resolved transcriptomic datasets using SPARK shows it can be up to ten times more powerful than existing methods and disclose biological discoveries that otherwise cannot be revealed by existing approaches.
Abstract: Identifying genes that display spatial expression patterns in spatially resolved transcriptomic studies is an important first step toward characterizing the spatial transcriptomic landscape of complex tissues. Here we present a statistical method, SPARK, for identifying spatial expression patterns of genes in data generated from various spatially resolved transcriptomic techniques. SPARK directly models spatial count data through generalized linear spatial models. It relies on recently developed statistical formulas for hypothesis testing, providing effective control of type I errors and yielding high statistical power. With a computationally efficient algorithm, which is based on penalized quasi-likelihood, SPARK is also scalable to datasets with tens of thousands of genes measured on tens of thousands of samples. Analyzing four published spatially resolved transcriptomic datasets using SPARK, we show it can be up to ten times more powerful than existing methods and disclose biological discoveries that otherwise cannot be revealed by existing approaches. A statistical method called SPARK for analyzing spatially resolved transcriptomic data can efficiently identify spatially expressed genes with effective control of type I errors and high statistical power.

Journal ArticleDOI
TL;DR: A density-modification procedure for improving maps from single-particle electron cryogenic microscopy (cryo-EM) improved map-model correlation and increased the visibility of details in many of the maps.
Abstract: A density-modification procedure for improving maps from single-particle electron cryogenic microscopy (cryo-EM) is presented. The theoretical basis of the method is identical to that of maximum-likelihood density modification, previously used to improve maps from macromolecular X-ray crystallography. Key differences from applications in crystallography are that the errors in Fourier coefficients are largely in the phases in crystallography but in both phases and amplitudes in cryo-EM, and that half-maps with independent errors are available in cryo-EM. These differences lead to a distinct approach for combination of information from starting maps with information obtained in the density-modification process. The density-modification procedure was applied to a set of 104 datasets and improved map-model correlation and increased the visibility of details in many of the maps. The procedure requires two unmasked half-maps and a sequence file or other source of information on the volume of the macromolecule that has been imaged.

Journal ArticleDOI
TL;DR: The GRABDA sensors resolve evoked DA release in mouse brain slices, detect evoked compartmental DA release from a single neuron in live flies, and report optogenetically elicited nigrostriatal DA release as well as mesoaccumbens dopaminergic activity during sexual behavior in freely behaving mice.
Abstract: Dopamine (DA) plays a critical role in the brain, and the ability to directly measure dopaminergic activity is essential for understanding its physiological functions. We therefore developed red fluorescent G-protein-coupled receptor-activation-based DA (GRABDA) sensors and optimized versions of green fluorescent GRABDA sensors. In response to extracellular DA, both the red and green GRABDA sensors exhibit a large increase in fluorescence, with subcellular resolution, subsecond kinetics and nanomolar-to-submicromolar affinity. Moreover, the GRABDA sensors resolve evoked DA release in mouse brain slices, detect evoked compartmental DA release from a single neuron in live flies and report optogenetically elicited nigrostriatal DA release as well as mesoaccumbens dopaminergic activity during sexual behavior in freely behaving mice. Coexpressing red GRABDA with either green GRABDA or the calcium indicator GCaMP6s allows tracking of dopaminergic signaling and neuronal activity in distinct circuits in vivo.

Journal ArticleDOI
TL;DR: A deep learning-based framework to quantify and analyze brain vasculature, named Vessel Segmentation & Analysis Pipeline (VesSAP), which uses a convolutional neural network with a transfer learning approach for segmentation and achieves human-level accuracy.
Abstract: Tissue clearing methods enable the imaging of biological specimens without sectioning. However, reliable and scalable analysis of large imaging datasets in three dimensions remains a challenge. Here we developed a deep learning-based framework to quantify and analyze brain vasculature, named Vessel Segmentation & Analysis Pipeline (VesSAP). Our pipeline uses a convolutional neural network (CNN) with a transfer learning approach for segmentation and achieves human-level accuracy. By using VesSAP, we analyzed the vascular features of whole C57BL/6J, CD1 and BALB/c mouse brains at the micrometer scale after registering them to the Allen mouse brain atlas. We report evidence of secondary intracranial collateral vascularization in CD1 mice and find reduced vascularization of the brainstem in comparison to the cerebrum. Thus, VesSAP enables unbiased and scalable quantifications of the angioarchitecture of cleared mouse brains and yields biological insights into the vascular function of the brain.

Journal ArticleDOI
TL;DR: The Philosopher toolkit integrates high-performance algorithms and existing tools and is a dependency-free, fast and comprehensive proteomics pipeline, able to rapidly process even the most complex proteomics datasets with efficient resource management.
Abstract: To the Editor — Here we introduce Philosopher (https://philosopher.nesvilab. org), a free, open-source, versatile and robust data analysis toolkit designed to bring easy access to a powerful and comprehensive set of computational tools for shotgun proteomics data analysis. Computational analysis is a central component of any modern experiment, and mass-spectrometry-based proteomics is no exception. As technologies continue to rapidly advance with respect to throughput and sensitivity, bioinformatics tools must keep pace with large-scale experiments. While existing proteomics tools such as the Trans-Proteomic Pipeline (TPP)1, MaxQuant2 and PeptideShaker3 are capable of performing high-quality analyses, all require installation and depend on specific operating systems, libraries and other software. Managing these tools can be a daunting task, even for research groups with substantial bioinformatics expertise. This is particularly true when experiments demand high-performance configurations such as GNU/Linux clusters or cloud computing. To address this challenge, we initially built and deployed Docker containers with different applications for proteomics, which in part inspired the creation of the BioContainers resource for different bioinformatics fields4. Though this method was efficient for packing and sharing resources, we found that chaining different applications with custom implementation of established algorithms in a transparent and dependency-free way was still a challenge for containerization. The Philosopher toolkit integrates high-performance algorithms and existing tools (Fig. 1) and is a dependency-free, fast and comprehensive proteomics pipeline, able to rapidly process even the most complex proteomics datasets with efficient resource management. Philosopher includes the database search engine Comet and can use the high-performance search engine MSFragger5 as a separately downloaded tool. For downstream processing of peptide– spectrum matches (PSMs), Philosopher includes key components of TPP. In addition, it implements best practices for false discovery rate (FDR) filtering and data summarization that are not readily available within the TPP, such as picked FDR, two-dimensional or sequential (at PSM and protein levels) filters, and additional options for dealing with peptides whose sequence is present in multiple proteins (for example, the razor peptide approach). As quantification is frequently the goal of modern proteomics experiments, Philosopher includes algorithms for both label-free quantification and isobaric label-based quantification (TMT or iTRAQ). Precursor spectral intensities are retrieved following a method described previously6. Protein-level quantification is estimated using the sum of the three most intense supporting ions. Alternatively, Philosopher can use TMT-Integrator (http://tmt-integrator.nesvilab.org/) as an external tool or output files can be used with downstream quantification and statistical tools such as MSstats7. The rich reports generated by Philosopher are also compatible with other software such as PDV for visualization of peptide assignments to tandem mass spectra8 and CRAPome and REPRINT (https://reprint-apms. org/) for interactome scoring and network

Journal ArticleDOI
TL;DR: DeepSTORM3D uses deep learning for accurate localization of point emitters in densely labeled samples in three dimensions for volumetric localization microscopy with high temporal resolution, as well as for optimal point-spread function design.
Abstract: An outstanding challenge in single-molecule localization microscopy is the accurate and precise localization of individual point emitters in three dimensions in densely labeled samples. One established approach for three-dimensional single-molecule localization is point-spread-function (PSF) engineering, in which the PSF is engineered to vary distinctively with emitter depth using additional optical elements. However, images of dense emitters, which are desirable for improving temporal resolution, pose a challenge for algorithmic localization of engineered PSFs, due to lateral overlap of the emitter PSFs. Here we train a neural network to localize multiple emitters with densely overlapping Tetrapod PSFs over a large axial range. We then use the network to design the optimal PSF for the multi-emitter case. We demonstrate our approach experimentally with super-resolution reconstructions of mitochondria and volumetric imaging of fluorescently labeled telomeres in cells. Our approach, DeepSTORM3D, enables the study of biological processes in whole cells at timescales that are rarely explored in localization microscopy.

Journal ArticleDOI
TL;DR: Souporcell is developed, a method to cluster cells using the genetic variants detected within the scRNA-seq reads, which achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.
Abstract: Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA. Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.

Journal ArticleDOI
TL;DR: Q -score analysis of multiple cryo-EM maps of the same proteins derived from different laboratories confirms the reproducibility of structural features from side chains down to water and ion atoms, and can be used at the atom, residue or macromolecule scale.
Abstract: Cryogenic electron microscopy (cryo-EM) maps are now at the point where resolvability of individual atoms can be achieved. However, resolvability is not necessarily uniform throughout the map. We introduce a quantitative parameter to characterize the resolvability of individual atoms in cryo-EM maps, the map Q-score. Q-scores can be calculated for atoms in proteins, nucleic acids, water, ligands and other solvent atoms, using models fitted to or derived from cryo-EM maps. Q-scores can also be averaged to represent larger features such as entire residues and nucleotides. Averaged over entire models, Q-scores correlate very well with the estimated resolution of cryo-EM maps for both protein and RNA. Assuming the models they are calculated from are well fitted to the map, Q-scores can be used as a measure of resolvability in cryo-EM maps at various scales, from entire macromolecules down to individual atoms. Q-score analysis of multiple cryo-EM maps of the same proteins derived from different laboratories confirms the reproducibility of structural features from side chains down to water and ion atoms.

Journal ArticleDOI
TL;DR: This work introduces probabilistic cell typing by in situ sequencing (pciSeq), an approach that leverages previous scRNA-seq classification to identify cell types using multiplexed in situ RNA detection to spatially map cell types accurately in the mouse hippocampus and isocortex.
Abstract: Understanding the function of a tissue requires knowing the spatial organization of its constituent cell types. In the cerebral cortex, single-cell RNA sequencing (scRNA-seq) has revealed the genome-wide expression patterns that define its many, closely related neuronal types, but cannot reveal their spatial arrangement. Here we introduce probabilistic cell typing by in situ sequencing (pciSeq), an approach that leverages previous scRNA-seq classification to identify cell types using multiplexed in situ RNA detection. We applied this method by mapping the inhibitory neurons of mouse hippocampal area CA1, for which ground truth is available from extensive previous work identifying their laminar organization. Our method identified these neuronal classes in a spatial arrangement matching ground truth, and further identified multiple classes of isocortical pyramidal cell in a pattern matching their known organization. This method will allow identifying the spatial organization of closely related cell types across the brain and other tissues.

Journal ArticleDOI
TL;DR: The GRAB ACh (GPCR-activation-based ACh) sensor is optimized to achieve substantially improved sensitivity in ACh detection, as well as reduced downstream coupling to intracellular pathways.
Abstract: The ability to directly measure acetylcholine (ACh) release is an essential step toward understanding its physiological function. Here we optimized the GRABACh (GPCR-activation-based ACh) sensor to achieve substantially improved sensitivity in ACh detection, as well as reduced downstream coupling to intracellular pathways. The improved version of the ACh sensor retains the subsecond response kinetics, physiologically relevant affinity and precise molecular specificity for ACh of its predecessor. Using this sensor, we revealed compartmental ACh signals in the olfactory center of transgenic flies in response to external stimuli including odor and body shock. Using fiber photometry recording and two-photon imaging, our ACh sensor also enabled sensitive detection of single-trial ACh dynamics in multiple brain regions in mice performing a variety of behaviors.

Journal ArticleDOI
TL;DR: Approaches of Acr proteins for post-translational control of CRISPR-Cas systems in prokaryotic and mammalian cells, organisms and ecosystems are discussed.
Abstract: Clustered, regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated (Cas) genes, a diverse family of prokaryotic adaptive immune systems, have emerged as a biotechnological tool and therapeutic. The discovery of protein inhibitors of CRISPR-Cas systems, called anti-CRISPR (Acr) proteins, enables the development of more controllable and precise CRISPR-Cas tools. Here we discuss applications of Acr proteins for post-translational control of CRISPR-Cas systems in prokaryotic and mammalian cells, organisms and ecosystems.

Journal ArticleDOI
TL;DR: Computational methods for analysis and integration of single-cell omics data across different modalities are summarized and their applications, challenges and future directions are discussed.
Abstract: Single-cell omics approaches provide high-resolution data on cellular phenotypes, developmental dynamics and communication networks in diverse tissues and conditions. Emerging technologies now measure different modalities of individual cells, such as genomes, epigenomes, transcriptomes and proteomes, in addition to spatial profiling. Combined with analytical approaches, these data open new avenues for accurate reconstruction of gene-regulatory and signaling networks driving cellular identity and function. Here we summarize computational methods for analysis and integration of single-cell omics data across different modalities and discuss their applications, challenges and future directions.

Journal ArticleDOI
TL;DR: A convolutional neural network, Akita, is presented that accurately predicts genome folding from DNA sequence alone and can be used to perform in silico saturation mutagenesis, interpret eQTLs, make predictions for structural variants, and probe species-specific genome folding.
Abstract: In interphase, the human genome sequence folds in three dimensions into a rich variety of locus-specific contact patterns. Cohesin and CTCF (CCCTC-binding factor) are key regulators; perturbing the levels of either greatly disrupts genome-wide folding as assayed by chromosome conformation capture methods. Still, how a given DNA sequence encodes a particular locus-specific folding pattern remains unknown. Here we present a convolutional neural network, Akita, that accurately predicts genome folding from DNA sequence alone. Representations learned by Akita underscore the importance of an orientation-specific grammar for CTCF binding sites. Akita learns predictive nucleotide-level features of genome folding, revealing effects of nucleotides beyond the core CTCF motif. Once trained, Akita enables rapid in silico predictions. Accounting for this, we demonstrate how Akita can be used to perform in silico saturation mutagenesis, interpret eQTLs, make predictions for structural variants and probe species-specific genome folding. Collectively, these results enable decoding genome function from sequence through structure.