scispace - formally typeset
Search or ask a question
Author

Chris Sander

Bio: Chris Sander is an academic researcher from Harvard University. The author has contributed to research in topics: Large Hadron Collider & Protein structure. The author has an hindex of 178, co-authored 713 publications receiving 233287 citations. Previous affiliations of Chris Sander include Purdue University & University of Leeds.


Papers
More filters
Journal ArticleDOI
TL;DR: The Dali/FSSP organization of protein structures provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination.
Abstract: The FSSP database and its new supplement, the Dali Domain Dictionary, present a continuously updated classification of all known 3D protein structures. The classification is derived using an automatic structure alignment program (Dali) for the all-against-all comparison of structures in the Protein Data Bank. From the resulting enumeration of structural neighbours (which form a surprisingly continuous distribution in fold space) we derive a discrete fold classification in three steps: (i) sequence-related families are covered by a representative set of protein chains; (ii) protein chains are decomposed into structural domains based on the recurrence of structural motifs; (iii) folds are defined as tight clusters of domains in fold space. The fold classification, domain definitions and test sets for sequence-structure alignment (threading) are accessible on the web at www.embl-ebi.ac.uk/dali . The web interface provides a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences leading, for example, to a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The Dali/FSSP organization of protein structures provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination.

697 citations

Journal ArticleDOI
02 Nov 2017-Cell
TL;DR: This large-scale analysis of 206 adult soft tissue sarcomas reveals previously unappreciated sarcoma-type-specific changes in copy number, methylation, RNA, and protein, providing insights into refining Sarcoma therapy and relationships to other cancer types.

684 citations

Journal ArticleDOI
Emek Demir1, Emek Demir2, Michael P. Cary1, Suzanne M. Paley3, Ken Fukuda, Christian Lemer4, Imre Vastrik, Guanming Wu5, Peter D'Eustachio6, Carl F. Schaefer7, Joanne S. Luciano, Frank Schacherer, Irma Martínez-Flores8, Zhenjun Hu9, Verónica Jiménez-Jacinto8, Geeta Joshi-Tope10, Kumaran Kandasamy11, Alejandra López-Fuentes8, Huaiyu Mi3, Elgar Pichler, Igor Rodchenkov12, Andrea Splendiani13, Andrea Splendiani14, Sasha Tkachev15, Jeremy Zucker16, Gopal R. Gopinath17, Harsha Rajasimha7, Harsha Rajasimha18, Ranjani Ramakrishnan19, Imran Shah20, Mustafa H Syed21, Nadia Anwar1, Özgün Babur1, Özgün Babur2, Michael L. Blinov22, Erik Brauner23, Dan Corwin, Sylva L. Donaldson12, Frank Gibbons23, Robert N. Goldberg24, Peter Hornbeck15, Augustin Luna7, Peter Murray-Rust25, Eric K. Neumann, Oliver Reubenacker22, Matthias Samwald26, Matthias Samwald27, Martijn P. van Iersel28, Sarala M. Wimalaratne29, Keith Allen30, Burk Braun, Michelle Whirl-Carrillo31, Kei-Hoi Cheung32, Kam D. Dahlquist33, Andrew Finney, Marc Gillespie34, Elizabeth M. Glass21, Li Gong31, Robin Haw5, Michael Honig35, Olivier Hubaut4, David W. Kane36, Shiva Krupa37, Martina Kutmon38, Julie Leonard30, Debbie Marks23, David Merberg39, Victoria Petri40, Alexander R. Pico41, Dean Ravenscroft42, Liya Ren10, Nigam H. Shah31, Margot Sunshine7, Rebecca Tang30, Ryan Whaley30, Stan Letovksy43, Kenneth H. Buetow7, Andrey Rzhetsky44, Vincent Schächter45, Bruno S. Sobral18, Ugur Dogrusoz2, Shannon K. McWeeney19, Mirit I. Aladjem7, Ewan Birney, Julio Collado-Vides8, Susumu Goto46, Michael Hucka47, Nicolas Le Novère, Natalia Maltsev21, Akhilesh Pandey11, Paul Thomas3, Edgar Wingender, Peter D. Karp3, Chris Sander1, Gary D. Bader12 
TL;DR: Thousands of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases, and this large amount of pathway data in a computable form will support visualization, analysis and biological discovery.
Abstract: Biological Pathway Exchange (BioPAX) is a standard language to represent biological pathways at the molecular and cellular level and to facilitate the exchange of pathway data. The rapid growth of the volume of pathway data has spurred the development of databases and computational tools to aid interpretation; however, use of these data is hampered by the current fragmentation of pathway information across many databases with incompatible formats. BioPAX, which was created through a community process, solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. Using BioPAX, millions of interactions, organized into thousands of pathways, from many organisms are available from a growing number of databases. This large amount of pathway data in a computable form will support visualization, analysis and biological discovery.

673 citations

Journal ArticleDOI
Alison M. Taylor1, Alison M. Taylor2, Juliann Shih1, Gavin Ha2  +729 moreInstitutions (4)
TL;DR: The genomic and phenotypic correlates of cancer aneuploidy are defined and genome engineering is applied to delete 3p in lung cells, causing decreased proliferation rescued in part by chromosome 3 duplication.

660 citations

Journal ArticleDOI
TL;DR: This work proposes a community standard data model for the representation and exchange of protein interaction data, jointly developed by members of the Proteomics Standards Initiative (PSI) and the Human Proteome Organization (HUPO).
Abstract: A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).

658 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.
Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

63,427 citations

Journal ArticleDOI
TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.
Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

38,522 citations

Journal ArticleDOI
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the logexpectation score, and refinement using treedependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.

37,524 citations

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations