scispace - formally typeset
Search or ask a question
Author

Li-Yun Xiu

Bio: Li-Yun Xiu is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Mass spectrum & Top-down proteomics. The author has an hindex of 6, co-authored 8 publications receiving 637 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: pLink as mentioned in this paper is a software for data analysis of cross-linked proteins coupled with mass-spectrometry analysis, which is compatible with multiple homo- or hetero-bifunctional cross-linkers.
Abstract: pLink, software for data analysis of cross-linked proteins coupled with mass spectrometry, estimates false discovery rate and enables analysis of protein complexes without extensive purification. We have developed pLink, software for data analysis of cross-linked proteins coupled with mass-spectrometry analysis. pLink reliably estimates false discovery rate in cross-link identification and is compatible with multiple homo- or hetero-bifunctional cross-linkers. We validated the program with proteins of known structures, and we further tested it on protein complexes, crude immunoprecipitates and whole-cell lysates. We show that it is a robust tool for protein-structure and protein-protein–interaction studies.

528 citations

Journal ArticleDOI
TL;DR: PParse proposes a method, named pParse, to export the most probable monoisotopic peaks for precursors, including co‐eluted precursORS, to detect candidate clusters by using the relationship between the position of the highest peak and the mass of the first peak to detect Candidate clusters.
Abstract: Determining the monoisotopic peak of a precursor is a first step in interpreting mass spectra, which is basic but non-trivial. The reason is that in the isolation window of a precursor, other peaks interfere with the determination of the monoisotopic peak, leading to wrong mass-to-charge ratio or charge state. Here we propose a method, named pParse, to export the most probable monoisotopic peaks for precursors, including co-eluted precursors. We use the relationship between the position of the highest peak and the mass of the first peak to detect candidate clusters. Then, we extract three features to sort the candidate clusters: (i) the sum of the intensity, (ii) the similarity of the experimental and the theoretical isotopic distribution, and (iii) the similarity of elution profiles. We showed that the recall of pParse, MaxQuant, and BioWorks was 98-98.8%, 0.5-17%, and 1.8-36.5% at the same precision, respectively. About 50% of tandem mass spectra are triggered by multiple precursors which are difficult to identify. Then we design a new scoring function to identify the co-eluted precursors. About 26% of all identified peptides were exclusively from co-eluted peptides. Therefore, accurately determining monoisotopic peaks, including co-eluted precursors, can greatly increase peptide identification rate.

66 citations

Journal ArticleDOI
TL;DR: The study has revealed that different charge states of precursors result in different hydrogen rearrangement patterns, which should be valuable for automated database search, de novo peptide sequencing, and manual spectral validation.
Abstract: In recent years, electron transfer dissociation (ETD) has enjoyed widespread applications from sequencing of peptides with or without post-translational modifications to top-down analysis of intact proteins. However, peptide identification rates from ETD spectra compare poorly with those from collision induced dissociation (CID) spectra, especially for doubly charged precursors. This is in part due to an insufficient understanding of the characteristics of ETD and consequently a failure of database search engines to make use of the rich information contained in the ETD spectra. In this study, we statistically characterized ETD fragmentation patterns from a collection of 461 440 spectra and subsequently implemented our findings into pFind, a database search engine developed earlier for CID data. From ETD spectra of doubly charged precursors, pFind 2.1 identified 63−122% more unique peptides than Mascot 2.2 under the same 1% false discovery rate. For higher charged peptides as well as phosphopeptides, pFind...

38 citations

Journal ArticleDOI
TL;DR: A statistical algorithm named DeltAMT (Delta Accurate Mass and Time) for fast detection of abundant protein modifications from tandem mass spectra with high-accuracy precursor masses is presented, which is highly efficient while being accurate and sensitive.

33 citations

Journal ArticleDOI
TL;DR: This paper describes an efficient and sequence database-independent approach to detecting abundant post-translational modifications in high-accuracy peptide mass spectra using a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones.
Abstract: Peptide identification via tandem mass spectrometry is the basic task of current proteomics research. Due to the complexity of mass spectra, the majority of mass spectra cannot be interpreted at present. The existence of unexpected or unknown protein post-translational modifications is a major reason. This paper describes an efficient and sequence database-independent approach to detecting abundant post-translational modifications in high-accuracy peptide mass spectra. The approach is based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time. Frequently occurring peptide mass differences in a data set imply possible modifications, while small and consistent retention time differences provide orthogonal supporting evidence. We propose to use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones. Due to the use of two-dimensional information, accurate modification masses and confident spectral pairs can be determined as well as the quantitative influences of modifications on peptide retention time. Experiments on two glycoprotein data sets demonstrate that our method can effectively detect abundant modifications and spectral pairs. By including the discovered modifications into database search or by propagating peptide assignments between paired spectra, an average of 10% more spectra are interpreted.

19 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The progress of proteomics has been driven by the development of new technologies for peptide/protein separation, mass spectrometry analysis, isotope labeling for quantification, and bioinformatics data analysis.
Abstract: According to Genome Sequencing Project statistics (http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html), as of Feb 16, 2012, complete gene sequences have become available for 2816 viruses, 1117 prokaryotes, and 36 eukaryotes.1–2 The availability of full genome sequences has greatly facilitated biological research in many fields, and has greatly contributed to the growth of proteomics. Proteins are important because they are the direct bio-functional molecules in the living organisms. The term “proteomics” was coined from merging “protein” and “genomics” in the 1990s.3–4 As a post-genomic discipline, proteomics encompasses efforts to identify and quantify all the proteins of a proteome, including expression, cellular localization, interactions, post-translational modifications (PTMs), and turnover as a function of time, space and cell type, thus making the full investigation of a proteome more challenging than sequencing a genome. There are possibly 100,000 protein forms encoded by the approximate 20,235 genes of the human genome,5 and determining the explicit function of each form will be a challenge. The progress of proteomics has been driven by the development of new technologies for peptide/protein separation, mass spectrometry analysis, isotope labeling for quantification, and bioinformatics data analysis. Mass spectrometry has emerged as a core tool for large-scale protein analysis. In the past decade, there has been a rapid advance in the resolution, mass accuracy, sensitivity and scan rate of mass spectrometers used to analyze proteins. In addition, hybrid mass analyzers have been introduced recently (e.g. Linear Ion Trap-Orbitrap series6–7) which have significantly improved proteomic analysis. “Bottom-up” protein analysis refers to the characterization of proteins by analysis of peptides released from the protein through proteolysis. When bottom-up is performed on a mixture of proteins it is called shotgun proteomics,8–10 a name coined by the Yates lab because of its analogy to shotgun genomic sequencing.11 Shotgun proteomics provides an indirect measurement of proteins through peptides derived from proteolytic digestion of intact proteins. In a typical shotgun proteomics experiment, the peptide mixture is fractionated and subjected to LC-MS/MS analysis. Peptide identification is achieved by comparing the tandem mass spectra derived from peptide fragmentation with theoretical tandem mass spectra generated from in silico digestion of a protein database. Protein inference is accomplished by assigning peptide sequences to proteins. Because peptides can be either uniquely assigned to a single protein or shared by more than one protein, the identified proteins may be further scored and grouped based on their peptides. In contrast, another strategy, termed ‘top-down’ proteomics, is used to characterize intact proteins (Figure 1). The top-down approach has some potential advantages for PTM and protein isoform determination and has achieved notable success. Intact proteins have been measured up to 200 kDa,12 and a large scale study has identified more than 1,000 proteins by multi-dimensional separations from complex samples.13 However, the top-down method has significant limitations compared with shotgun proteomics due to difficulties with protein fractionation, protein ionization and fragmentation in the gas phase. By relying on the analysis of peptides, which are more easily fractionated, ionized and fragmented, shotgun proteomics can be more universally adopted for protein analysis. In fact, a hybrid of bottom-up and top-down methodologies and instrumentation has been introduced as middle-down proteomics.14 Essentially, middle-down proteomics analyzes larger peptide fragments than bottom-up proteomics, minimizing peptide redundancy between proteins. Additionally the large peptide fragments yield similar advantages as top-down proteomics, such as gaining further insight into post-translational modifications, without the analytical challenges of analyzing intact proteins. Shotgun proteomics has become a workhorse for the analysis of proteins and their modifications and will be increasingly combined with top-down methods in the future. Figure 1 Proteomic strategies: bottom-up vs. top-down vs. middle-down. The bottom-up approach analyzes proteolytic peptides. The top-down method measures the intact proteins. The middle-down strategy analyzes larger peptides resulted from limited digestion or ... In the past decade shotgun proteomics has been widely used by biologists for many different research experiments, advancing biological discoveries. Some applications include, but are not limited to, proteome profiling, protein quantification, protein modification, and protein-protein interaction. There have been several reviews nicely summarizing mass spectrometry history,15 protein quantification with mass spectrometry,16 its biological applications,5,17–26 and many recent advances in methodology.27–32 In this review, we try to provide a full and updated survey of shotgun proteomics, including the fundamental techniques and applications that laid the foundation along with those developed and greatly improved in the past several years.

1,184 citations

Journal ArticleDOI
TL;DR: A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search, and achieves significantly improved accuracy and sensitivity over two other commonly used software packages.

815 citations

Book
01 Nov 2005
TL;DR: In this article, the authors present an efficient reduction from constrained to unconstrained maximum agreement subtree for the maximum quartet consistency problem, which can be solved by using semi-definite programming.
Abstract: Expression.- Spectral Clustering Gene Ontology Terms to Group Genes by Function.- Dynamic De-Novo Prediction of microRNAs Associated with Cell Conditions: A Search Pruned by Expression.- Clustering Gene Expression Series with Prior Knowledge.- A Linear Time Biclustering Algorithm for Time Series Gene Expression Data.- Time-Window Analysis of Developmental Gene Expression Data with Multiple Genetic Backgrounds.- Phylogeny.- A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem.- Computing the Quartet Distance Between Trees of Arbitrary Degree.- Using Semi-definite Programming to Enhance Supertree Resolvability.- An Efficient Reduction from Constrained to Unconstrained Maximum Agreement Subtree.- Pattern Identification in Biogeography.- On the Complexity of Several Haplotyping Problems.- A Hidden Markov Technique for Haplotype Reconstruction.- Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event.- Networks.- A Faster Algorithm for Detecting Network Motifs.- Reaction Motifs in Metabolic Networks.- Reconstructing Metabolic Networks Using Interval Analysis.- Genome Rearrangements.- A 1.375-Approximation Algorithm for Sorting by Transpositions.- A New Tight Upper Bound on the Transposition Distance.- Perfect Sorting by Reversals Is Not Always Difficult.- Minimum Recombination Histories by Branch and Bound.- Sequences.- A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds.- Generalized Planted (l,d)-Motif Problem with Negative Set.- Alignment of Tandem Repeats with Excision, Duplication, Substitution and Indels (EDSI).- The Peres-Shields Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity.- Multiple Structural RNA Alignment with Lagrangian Relaxation.- Faster Algorithms for Optimal Multiple Sequence Alignment Based on Pairwise Comparisons.- Ortholog Clustering on a Multipartite Graph.- Linear Time Algorithm for Parsing RNA Secondary Structure.- A Compressed Format for Collections of Phylogenetic Trees and Improved Consensus Performance.- Structure.- Optimal Protein Threading by Cost-Splitting.- Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment.- Rotamer-Pair Energy Calculations Using a Trie Data Structure.- Improved Maintenance of Molecular Surfaces Using Dynamic Graph Connectivity.- The Main Structural Regularities of the Sandwich Proteins.- Discovery of Protein Substructures in EM Maps.

492 citations

Journal ArticleDOI
14 Aug 2014-Nature
TL;DR: A strategy for forming and purifying a functional human β2AR–β-arrestin-1 complex is devised that provides a framework for better understanding the basis of GPCR regulation by arrestins.
Abstract: Single-particle electron microscopy and hydrogen–deuterium exchange mass spectrometry are used to characterize the structure and dynamics of a G-protein-coupled receptor–arrestin complex. Much has been learned about the structure of G-protein-coupled receptors (GCPRs) over the past seven years, but we still don't know what an activated GPCR looks like when it is bound to a β-arrestin. (Arrestins are cellular mediators with a broad range of functions, many of them involving GPCRs.) In this study the authors use single-particle electron microscopy and hydrogen–deuterium exchange mass spectrometry to characterize the structure and dynamics of a GPCR–arrestin complex. Their data support a 'biphasic' mechanism, in which the arrestin initially interacts with the phosphorylated carboxy terminus of the GPCR before re-arranging to more fully engage the membrane protein in a signalling-competent conformation. G-protein-coupled receptors (GPCRs) are critically regulated by β-arrestins, which not only desensitize G-protein signalling but also initiate a G-protein-independent wave of signalling1,2,3,4,5. A recent surge of structural data on a number of GPCRs, including the β2 adrenergic receptor (β2AR)–G-protein complex, has provided novel insights into the structural basis of receptor activation6,7,8,9,10,11. However, complementary information has been lacking on the recruitment of β-arrestins to activated GPCRs, primarily owing to challenges in obtaining stable receptor–β-arrestin complexes for structural studies. Here we devised a strategy for forming and purifying a functional human β2AR–β-arrestin-1 complex that allowed us to visualize its architecture by single-particle negative-stain electron microscopy and to characterize the interactions between β2AR and β-arrestin 1 using hydrogen–deuterium exchange mass spectrometry (HDX-MS) and chemical crosslinking. Electron microscopy two-dimensional averages and three-dimensional reconstructions reveal bimodal binding of β-arrestin 1 to the β2AR, involving two separate sets of interactions, one with the phosphorylated carboxy terminus of the receptor and the other with its seven-transmembrane core. Areas of reduced HDX together with identification of crosslinked residues suggest engagement of the finger loop of β-arrestin 1 with the seven-transmembrane core of the receptor. In contrast, focal areas of raised HDX levels indicate regions of increased dynamics in both the N and C domains of β-arrestin 1 when coupled to the β2AR. A molecular model of the β2AR–β-arrestin signalling complex was made by docking activated β-arrestin 1 and β2AR crystal structures into the electron microscopy map densities with constraints provided by HDX-MS and crosslinking, allowing us to obtain valuable insights into the overall architecture of a receptor–arrestin complex. The dynamic and structural information presented here provides a framework for better understanding the basis of GPCR regulation by arrestins.

424 citations

Journal ArticleDOI
15 Nov 2018-Cell
TL;DR: This work elucidates the architecture and assembly pathway across three classes of mSWI/SNF complexes-canonical BRG1/BRM-associated factor (BAF), polybromo-associated BAF (PBAF, and newly defined ncBAF complexes) and defines the requirement of each subunit for complex formation and stability.

418 citations