scispace - formally typeset
Search or ask a question
Author

Jimin Pei

Other affiliations: Howard Hughes Medical Institute
Bio: Jimin Pei is an academic researcher from University of Texas Southwestern Medical Center. The author has contributed to research in topics: Multiple sequence alignment & Alignment-free sequence analysis. The author has an hindex of 34, co-authored 80 publications receiving 6999 citations. Previous affiliations of Jimin Pei include Howard Hughes Medical Institute.


Papers
More filters
Journal ArticleDOI
TL;DR: This study reveals previously unappreciated roles for lysine acetylation in the regulation of diverse cellular pathways outside of the nucleus, including many longevity regulators and metabolism enzymes.

1,422 citations

Journal ArticleDOI
TL;DR: This work explores the use of 3D structural information to guide sequence alignments constructed by the MSA program PROMALS, and outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.
Abstract: Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.

1,204 citations

Journal ArticleDOI
TL;DR: The first global screening of lysine acetylation is reported, identifying 138 modification sites in 91 proteins from Escherichia coli, showing an intimate link of this modification to energy metabolism and implying that functions oflysineacetylation beyond regulation of gene expression are evolutionarily conserved from bacteria to mammals.

448 citations

Journal ArticleDOI
TL;DR: A program to calculate a conservation index at each position in a multiple sequence alignment using several methods suggests that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments.
Abstract: MOTIVATION Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. RESULTS We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments. AVAILABILITY The C code of the AL2CO program and its pre-compiled versions for several platforms as well as the details of the analysis are freely available at ftp://iole.swmed.edu/pub/al2co/.

427 citations

Journal ArticleDOI
TL;DR: This work developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average, about three times more accurate than traditional pairwise sequence alignment methods.
Abstract: Motivation: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. Results: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. Availability: The PROMALS web server is available at:

346 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

Journal ArticleDOI
TL;DR: This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization of ENDscript 2 and ESPript 3 to handle a large number of data with reduced computation time.
Abstract: ENDscript 2 is a friendly Web server for extracting and rendering a comprehensive analysis of primary to quaternary protein structure information in an automated way. This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization. It takes advantage of the new version 3 of ESPript, our well-known sequence alignment renderer, improved to handle a large number of data with reduced computation time. From a single PDB entry or file, ENDscript produces high quality figures displaying multiple sequence alignment of proteins homologous to the query, colored according to residue conservation. Furthermore, the experimental secondary structure elements and a detailed set of relevant biophysical and structural data are depicted. All this information and more are now mapped on interactive 3D PyMOL representations. Thanks to its adaptive and rigorous algorithm, beginner to expert users can modify settings to fine-tune ENDscript to their needs. ENDscript has also been upgraded as an open platform for the visualization of multiple biochemical and structural data coming from external biotool Web servers, with both 2D and 3D representations. ENDscript 2 and ESPript 3 are freely available at http://endscript.ibcp.fr and http://espript.ibcp.fr, respectively.

4,722 citations

Journal ArticleDOI
14 Aug 2009-Science
TL;DR: A proteomic-scale analysis of protein acetylation suggests that it is an important biological regulatory mechanism and the regulatory scope of lysine acetylations is broad and comparable with that of other major posttranslational modifications.
Abstract: Lysine acetylation is a reversible posttranslational modification of proteins and plays a key role in regulating gene expression. Technological limitations have so far prevented a global analysis of lysine acetylation's cellular roles. We used high-resolution mass spectrometry to identify 3600 lysine acetylation sites on 1750 proteins and quantified acetylation changes in response to the deacetylase inhibitors suberoylanilide hydroxamic acid and MS-275. Lysine acetylation preferentially targets large macromolecular complexes involved in diverse cellular processes, such as chromatin remodeling, cell cycle, splicing, nuclear transport, and actin nucleation. Acetylation impaired phosphorylation-dependent interactions of 14-3-3 and regulated the yeast cyclin-dependent kinase Cdc28. Our data demonstrate that the regulatory scope of lysine acetylation is broad and comparable with that of other major posttranslational modifications.

3,787 citations