scispace - formally typeset
Search or ask a question
Author

Chris Sander

Bio: Chris Sander is an academic researcher from Harvard University. The author has contributed to research in topics: Large Hadron Collider & Protein structure. The author has an hindex of 178, co-authored 713 publications receiving 233287 citations. Previous affiliations of Chris Sander include Purdue University & University of Leeds.


Papers
More filters
Posted ContentDOI
28 Feb 2021-bioRxiv
TL;DR: In this article, a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments is proposed, which performs state-of-the-art prediction of missense and indel effects.
Abstract: The ability to design functional sequences and predict effects of variation is central to protein engineering and biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments are not robust. Such applications include the prediction of variant effects of indels, disordered proteins, and the design of proteins such as antibodies due to the highly variable complementarity determining regions. We introduce a deep generative model adapted from natural language processing for prediction and design of diverse functional sequences without the need for alignments. The model performs state-of-art prediction of missense and indel effects and we successfully design and test a diverse 105-nanobody library that shows better expression than a 1000-fold larger synthetic library. Our results demonstrate the power of the ‘alignment-free’ autoregressive model in generalizing to regions of sequence space traditionally considered beyond the reach of prediction and design.

22 citations

Journal ArticleDOI
S. Chatrchyan1, Vardan Khachatryan1, Albert M. Sirunyan1, A. Tumasyan1  +3953 moreInstitutions (145)
TL;DR: In this paper, a study of color coherence effects in pp collisions at a center-of-mass energy of 7 TeV is presented, where the two jets with the largest transverse momentum exhibit a back-to-back topology.
Abstract: A study of color coherence effects in pp collisions at a center-of-mass energy of 7 TeV is presented. The data used in the analysis were collected in 2010 with the CMS detector at the LHC and correspond to an integrated luminosity of 36 inverse picobarns. Events are selected that contain at least three jets and where the two jets with the largest transverse momentum exhibit a back-to-back topology. The measured angular correlation between the second- and third-leading jet is shown to be sensitive to color coherence effects, and is compared to the predictions of Monte Carlo models with various implementations of color coherence. None of the models describe the data satisfactorily.

22 citations

Posted ContentDOI
02 Feb 2018-bioRxiv
TL;DR: This paper presents a computational method to generate causal explanations for proteomic profiles using prior mechanistic knowledge in the literature, as recorded in cellular pathway maps, and demonstrates its potential to become a powerful discovery tool as the amount and quality of cellular profiling rapidly expands.
Abstract: Measurement of changes in protein levels and in post-translational modifications, such as phosphorylation, can be highly informative about the phenotypic consequences of genetic differences or about the dynamics of cellular processes. Typically, such proteomic profiles are interpreted intuitively or by simple correlation analysis. Here, we present a computational method to generate causal explanations for proteomic profiles using prior mechanistic knowledge in the literature, as recorded in cellular pathway maps. To demonstrate its potential, we use this method to analyze the cascading events after EGF stimulation of a cell line, to discover new pathways in platelet activation, to identify influential regulators of oncoproteins in breast cancer, to describe signaling characteristics in predefined subtypes of ovarian and breast cancers, and to highlight which pathway relations are most frequently activated across 32 cancer types. Causal pathway analysis, that combines molecular profiles with prior biological knowledge captured in computational form, may become a powerful discovery tool as the amount and quality of cellular profiling rapidly expands. The method is freely available at http://causalpath.org.

22 citations

Journal ArticleDOI
Morad Aaboud, Georges Aad1, Brad Abbott2, Jalal Abdallah3  +2881 moreInstitutions (200)
TL;DR: In this article, an observable ratio of cross sections is defined for events containing jets and large missing transverse momentum in the plane transverse to the proton beams at the Large Hadron Collider, which can be used to constrain new physics models beyond those shown in this paper.
Abstract: Observables sensitive to the anomalous production of events containing hadronic jets and missing momentum in the plane transverse to the proton beams at the Large Hadron Collider are presented. The observables are defined as a ratio of cross sections, for events containing jets and large missing transverse momentum to events containing jets and a pair of charged leptons from the decay of a $Z/\gamma ^*$ boson. This definition minimises experimental and theoretical systematic uncertainties in the measurements. This ratio is measured differentially with respect to a number of kinematic properties of the hadronic system in two phase-space regions, one inclusive single-jet region and one region sensitive to vector-boson-fusion topologies. The data are found to be in agreement with the Standard Model predictions and used to constrain a variety of theoretical models for dark-matter production, including simplified models, effective field theory models, and invisible decays of the Higgs boson. The measurements use 3.2 fb$^{-1}$ of proton–proton collision data recorded by the ATLAS experiment at a centre-of-mass energy of 13 $\text {TeV}$ and are fully corrected for detector effects, meaning that the data can be used to constrain new-physics models beyond those shown in this paper.

22 citations

Journal ArticleDOI
Morad Aaboud, Georges Aad1, Brad Abbott2, Ovsat Abdinov3  +2946 moreInstitutions (197)
TL;DR: This Letter presents a search for the production of a long-lived neutral particle (Z_{d}) decaying within the ATLAS hadronic calorimeter, in association with a standard model (SM) Z boson produced via an intermediate scalar boson, where Z→ℓ^{+}⚓^{-} ( ℓ=e, μ).
Abstract: This Letter presents a search for the production of a long-lived neutral particle (Zd) decaying within the ATLAS hadronic calorimeter, in association with a standard model (SM) Z boson produced via an intermediate scalar boson, where Z→+ (=e, μ). The data used were collected by the ATLAS detector during 2015 and 2016 pp collisions with a center-of-mass energy of s=13 TeV at the Large Hadron Collider and correspond to an integrated luminosity of 36.1±0.8 fb-1. No significant excess of events is observed above the expected background. Limits on the production cross section of the scalar boson times its decay branching fraction into the long-lived neutral particle are derived as a function of the mass of the intermediate scalar boson, the mass of the long-lived neutral particle, and its cτ from a few centimeters to one hundred meters. In the case that the intermediate scalar boson is the SM Higgs boson, its decay branching fraction to a long-lived neutral particle with a cτ approximately between 0.1 and 7 m is excluded with a 95% confidence level up to 10% for mZd between 5 and 15 GeV. © 2019 CERN for the ATLAS Collaboration. Published by the American Physical Society under the terms of the »https://creativecommons.org/licenses/by/4.0/» Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI. Funded by SCOAP 3 .

21 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.
Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

38,522 citations

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations