Home
/
Authors
/
Minzhu Xie

Author

Minzhu Xie

Other affiliations: University of California, Riverside

Bio: Minzhu Xie is an academic researcher from Hunan Normal University. The author has contributed to research in topics: Chromosome (genetic algorithm) & Genome-wide association study. The author has an hindex of 6, co-authored 10 publications receiving 222 citations. Previous affiliations of Minzhu Xie include University of California, Riverside.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction

[...]

Jiancheng Zhong¹, Yusui Sun¹, Wei Peng², Minzhu Xie¹, Jiahong Yang¹, Xiwei Tang³ - Show less +2 more•Institutions (3)

Hunan Normal University¹, Kunming University of Science and Technology², Hunan First Normal University³

31 May 2018-IEEE Transactions on Nanobioscience

TL;DR: A predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model.

...read moreread less

Abstract: Essential proteins as a vital part of maintaining the cells’ life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.

...read moreread less

104 citations

Journal Article•DOI•

Detecting genome-wide epistases based on the clustering of relatively frequent items

[...]

Minzhu Xie¹, Jing Li¹, Tao Jiang¹•Institutions (1)

University of California, Riverside¹

01 Jan 2012-Bioinformatics

TL;DR: This article develops a simple, fast and effective algorithm to detect genome-wide multi-locus epistatic interactions based on the clustering of relatively frequent items that is fast and more powerful in general than some recently proposed methods.

...read moreread less

Abstract: Motivation: In genome-wide association studies (GWAS), up to millions of single nucleotide polymorphisms (SNPs) are genotyped for thousands of individuals. However, conventional single locus-based approaches are usually unable to detect gene–gene interactions underlying complex diseases. Due to the huge search space for complicated high order interactions, many existing multi-locus approaches are slow and may suffer from low detection power for GWAS. Results: In this article, we develop a simple, fast and effective algorithm to detect genome-wide multi-locus epistatic interactions based on the clustering of relatively frequent items. Extensive experiments on simulated data show that our algorithm is fast and more powerful in general than some recently proposed methods. On a real genome-wide case–control dataset for age-related macular degeneration (AMD), the algorithm has identified genotype combinations that are significantly enriched in the cases. Availability: http://www.cs.ucr.edu/~minzhux/EDCF.zip Contact:minzhux@cs.ucr.edu; jingli@cwru.edu Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

69 citations

Journal Article•DOI•

H-PoP and H-PoPG: heuristic partitioning algorithms for single individual haplotyping of polyploids.

[...]

Minzhu Xie¹, Qiong Wu², Jianxin Wang³, Tao Jiang⁴, Tao Jiang⁵ - Show less +1 more•Institutions (5)

Hunan Normal University¹, Chinese Academy of Sciences², Central South University³, Tsinghua University⁴, University of California, Riverside⁵

16 Aug 2016-Bioinformatics

TL;DR: H-PoP and H-PoPG are proposed, based on dynamic programming and a strategy of limiting the number of intermediate solutions at each iteration, to solve the two models, respectively, and are much faster and more accurate than the recent state of theart polyploid haplotyping algorithms.

...read moreread less

Abstract: Author(s): Xie, Minzhu; Wu, Qiong; Wang, Jianxin; Jiang, Tao | Abstract: MotivationSome economically important plants including wheat and cotton have more than two copies of each chromosome. With the decreasing cost and increasing read length of next-generation sequencing technologies, reconstructing the multiple haplotypes of a polyploid genome from its sequence reads becomes practical. However, the computational challenge in polyploid haplotyping is much greater than that in diploid haplotyping, and there are few related methods.ResultsThis article models the polyploid haplotyping problem as an optimal poly-partition problem of the reads, called the Polyploid Balanced Optimal Partition model. For the reads sequenced from a k-ploid genome, the model tries to divide the reads into k groups such that the difference between the reads of the same group is minimized while the difference between the reads of different groups is maximized. When the genotype information is available, the model is extended to the Polyploid Balanced Optimal Partition with Genotype constraint problem. These models are all NP-hard. We propose two heuristic algorithms, H-PoP and H-PoPG, based on dynamic programming and a strategy of limiting the number of intermediate solutions at each iteration, to solve the two models, respectively. Extensive experimental results on simulated and real data show that our algorithms can solve the models effectively, and are much faster and more accurate than the recent state-of-the-art polyploid haplotyping algorithms. The experiments also show that our algorithms can deal with long reads and deep read coverage effectively and accurately. Furthermore, H-PoP might be applied to help determine the ploidy of an organism.Availability and implementationhttps://github.com/MinzhuXie/H-PoPG CONTACT: xieminzhu@hotmail.comSupplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

50 citations

Journal Article•DOI•

A fast and accurate algorithm for single individual haplotyping.

[...]

Minzhu Xie¹, Jianxin Wang², Tao Jiang³•Institutions (3)

Hunan Normal University¹, Central South University², University of California, Riverside³

12 Dec 2012-BMC Systems Biology

TL;DR: A new optimization model, called Balanced Optimal Partition (BOP), for single individual haplotyping, which generalizes two existing models, Minimum Error Correction (MEC) and Maximum Fragments Cut (MFC), and could be made either model by using some extreme parameter values.

...read moreread less

Abstract: Due to the difficulty in separating two (paternal and maternal) copies of a chromosome, most published human genome sequences only provide genotype information, i.e., the mixed information of the underlying two haplotypes. However, phased haplotype information is needed to completely understand complex genetic polymorphisms and to increase the power of genome-wide association studies for complex diseases. With the rapid development of DNA sequencing technologies, reconstructing a pair of haplotypes from an individual's aligned DNA fragments by computer algorithms (i.e., Single Individual Haplotyping) has become a practical haplotyping approach. In the paper, we combine two measures "errors corrected" and "fragments cut" and propose a new optimization model, called Balanced Optimal Partition (BOP), for single individual haplotyping. The model generalizes two existing models, Minimum Error Correction (MEC) and Maximum Fragments Cut (MFC), and could be made either model by using some extreme parameter values. To solve the model, we design a heuristic dynamic programming algorithm H-BOP. By limiting the number of intermediate solutions at each iteration to an appropriately chosen small integer k, H-BOP is able to solve the model efficiently. Extensive experimental results on simulated and real data show that when k = 8, H-BOP is generally faster and more accurate than a recent state-of-art algorithm ReFHap in haplotype reconstruction. The running time of H-BOP is linearly dependent on some of the key parameters controlling the input size and H-BOP scales well to large input data. The code of H-BOP is available to the public for free upon request to the corresponding author.

...read moreread less

28 citations

Journal Article•DOI•

Accurate HLA type inference using a weighted similarity graph

[...]

Minzhu Xie¹, Minzhu Xie², Jing Li³, Tao Jiang²•Institutions (3)

Hunan Normal University¹, University of California, Riverside², Case Western Reserve University³

14 Dec 2010-BMC Bioinformatics

TL;DR: An accurate HLA gene type inference algorithm is designed by utilizing SNP genotype data from pedigrees, known Hla gene types of some individuals and the relationship between inferred SNP haplotypes and HLAGene types to achieve higher accuracy.

...read moreread less

Abstract: Background: The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. Results: In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test. Conclusions: Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors.

...read moreread less

27 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens

[...]

Xiaoming Jia¹, Buhm Han², Buhm Han³, Suna Onengut-Gumuscu⁴, Wei-Min Chen⁴, Patrick Concannon⁴, Stephen S. Rich⁴, Soumya Raychaudhuri, Paul I.W. de Bakker - Show less +5 more•Institutions (4)

Massachusetts Institute of Technology¹, Broad Institute², Brigham and Women's Hospital³, University of Virginia⁴

06 Jun 2013-PLOS ONE

TL;DR: A computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I and class II HLA loci and how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals is described.

...read moreread less

Abstract: DNA sequence variation within human leukocyte antigen (HLA) genes mediate susceptibility to a wide range of human diseases. The complex genetic structure of the major histocompatibility complex (MHC) makes it difficult, however, to collect genotyping data in large cohorts. Long-range linkage disequilibrium between HLA loci and SNP markers across the major histocompatibility complex (MHC) region offers an alternative approach through imputation to interrogate HLA variation in existing GWAS data sets. Here we describe a computational strategy, SNP2HLA, to impute classical alleles and amino acid polymorphisms at class I (HLA-A ,- B ,- C) and class II (-DPA1 ,- DPB1 ,- DQA1 ,- DQB1, and -DRB1) loci. To characterize performance of SNP2HLA, we constructed two European ancestry reference panels, one based on data collected in HapMap-CEPH pedigrees (90 individuals) and another based on data collected by the Type 1 Diabetes Genetics Consortium (T1DGC, 5,225 individuals). We imputed HLA alleles in an independent data set from the British 1958 Birth Cohort (N=918) with gold standard four-digit HLA types and SNPs genotyped using the Affymetrix GeneChip 500 K and Illumina Immunochip microarrays. We demonstrate that the sample size of the reference panel, rather than SNP density of the genotyping platform, is critical to achieve high imputation accuracy. Using the larger T1DGC reference panel, the average accuracy at four-digit resolution is 94.7% using the low-density Affymetrix GeneChip 500 K, and 96.7% using the highdensity Illumina Immunochip. For amino acid polymorphisms within HLA genes, we achieve 98.6% and 99.3% accuracy using the Affymetrix GeneChip 500 K and Illumina Immunochip, respectively. Finally, we demonstrate how imputation and association testing at amino acid resolution can facilitate fine-mapping of primary MHC association signals, giving a specific example from type 1 diabetes.

...read moreread less

576 citations

Journal Article•DOI•

Detecting epistasis in human complex traits

[...]

Wenhua Wei¹, Gibran Hemani², Chris Haley¹•Institutions (2)

Medical Research Council¹, University of Queensland²

09 Sep 2014-Nature Reviews Genetics

TL;DR: The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation.

...read moreread less

Abstract: Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.

...read moreread less

391 citations

Journal Article•DOI•

The emerging landscape of single-molecule protein sequencing technologies

[...]

Javier A. Alfaro¹, Peggy R. Bohländer², Mingjie Dai³, Mike Filius², Cecil J Howard⁴, Xander F. van Kooten⁵, Shilo Ohayon⁵, Adam Pomorski², Sonja Schmid⁶, Aleksei Aksimentiev⁷, Eric V. Anslyn⁴, Georges Bedran¹, Chan Cao⁸, Mauro Chinappi⁹, Etienne Coyaud¹⁰, Cees Dekker², Gunnar Dittmar¹¹, Nicholas Drachman¹², Rienk Eelkema², David R. Goodlett¹³, David R. Goodlett¹, Sebastien Hentz¹⁴, Umesh Kalathiya¹, Neil L. Kelleher¹⁵, Ryan T. Kelly¹⁶, Zvi Kelman¹⁷, Sung Hyun Kim², Bernhard Kuster¹⁸, David Rodriguez-Larrea¹⁹, Stuart Lindsay²⁰, Giovanni Maglia²¹, Edward M. Marcotte⁴, John P. Marino¹⁷, Christophe Masselon¹⁴, Michael Mayer²², Patroklos Samaras¹⁸, Kumar Sarthak⁷, Lusia Sepiashvili²³, Derek Stein¹², Meni Wanunu²⁴, Mathias Wilhelm¹⁸, Peng Yin³, Amit Meller⁵, Chirlmin Joo² - Show less +40 more•Institutions (24)

University of Gdańsk¹, Delft University of Technology², Harvard University³, University of Texas at Austin⁴, Technion – Israel Institute of Technology⁵, Wageningen University and Research Centre⁶, University of Illinois at Urbana–Champaign⁷, École Polytechnique Fédérale de Lausanne⁸, University of Rome Tor Vergata⁹, university of lille¹⁰, University of Luxembourg¹¹, Brown University¹², University of Victoria¹³, University of Grenoble¹⁴, Northwestern University¹⁵, Brigham Young University¹⁶, National Institute of Standards and Technology¹⁷, Technische Universität München¹⁸, University of the Basque Country¹⁹, Arizona State University²⁰, University of Groningen²¹, University of Fribourg²², University of Toronto²³, Northeastern University²⁴

01 Jun 2021-Nature Methods

TL;DR: In this paper, the authors describe new single-molecule protein sequencing and identification technologies alongside innovations in mass spectrometry that will eventually enable broad sequence coverage in single-cell profiling.

...read moreread less

Abstract: Single-cell profiling methods have had a profound impact on the understanding of cellular heterogeneity. While genomes and transcriptomes can be explored at the single-cell level, single-cell profiling of proteomes is not yet established. Here we describe new single-molecule protein sequencing and identification technologies alongside innovations in mass spectrometry that will eventually enable broad sequence coverage in single-cell profiling. These technologies will in turn facilitate biological discovery and open new avenues for ultrasensitive disease diagnostics. This Perspective describes new single-molecule protein sequencing and identification technologies alongside innovations in mass spectrometry that will eventually enable broad sequence coverage in single-cell proteomics.

...read moreread less

142 citations

TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization

[...]

Qiang Kou¹, Likun Xun¹, Xiaowen Liu², Xiaowen Liu¹•Institutions (2)

Indiana University – Purdue University Indianapolis¹, Indiana University²

15 Nov 2016

TL;DR: Liu et al. as discussed by the authors presented TopPIC, a tool that identifies and characterizes complex proteoforms with unknown primary structure alterations, such as amino acid mutations and post-translational modifications, by searching top-down tandem mass spectra against a protein database.

...read moreread less

Abstract: Top-down mass spectrometry enables the observation of whole complex proteoforms in biological samples and provides crucial information complementary to bottom-up mass spectrometry. Because of the complexity of top-down mass spectra and proteoforms, it is a challenging problem to efficiently interpret top-down tandem mass spectra in high-throughput proteome-level proteomics studies. We present TopPIC, a tool that efficiently identifies and characterizes complex proteoforms with unknown primary structure alterations, such as amino acid mutations and post-translational modifications, by searching top-down tandem mass spectra against a protein database. Availability and implementation http://proteomics.informatics.iupui.edu/software/toppic/ CONTACT: xwliu@iupui.eduSupplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

109 citations

Journal Article•DOI•

Interpretable XGBoost-SHAP Machine-Learning Model for Shear Strength Prediction of Squat RC Walls

[...]

De-Cheng Feng¹, Wen-Jie Wang¹, Sujith Mangalathu, Ertugrul Taciroglu²•Institutions (2)

Southeast University¹, University of California, Los Angeles²

01 Nov 2021-Journal of Structural Engineering-asce

TL;DR: RC shear walls are commonly used as lateral load-resisting elements in seismic regions, and the estimation of their shear strengths can become simultaneously design-critical and complex.

...read moreread less

Abstract: RC shear walls are commonly used as lateral load-resisting elements in seismic regions, and the estimation of their shear strengths can become simultaneously design-critical and complex whe...

...read moreread less

103 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

Collapse