scispace - formally typeset
Search or ask a question
Author

Zhenping Li

Bio: Zhenping Li is an academic researcher from Beijing Wuzi University. The author has contributed to research in topics: Complex network & Bipartite graph. The author has an hindex of 10, co-authored 15 publications receiving 364 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: To improve the MEC model for haplotype reconstruction, a new computational model is proposed, which simultaneously employs genotype information of an individual in the process of SNP correction, and is called MEC with genotypes information (shortly, MEC/GI).
Abstract: Motivation: Haplotype reconstruction based on aligned single nucleotide polymorphism (SNP) fragments is to infer a pair of haplotypes from localized polymorphism data gathered through short genome fragment assembly. An important computational model of this problem is the minimum error correction (MEC) model, which has been mentioned in several literatures. The model retrieves a pair of haplotypes by correcting minimum number of SNPs in given genome fragments coming from an individual's DNA. Results: In the first part of this paper, an exact algorithm for the MEC model is presented. Owing to the NP-hardness of the MEC model, we also design a genetic algorithm (GA). The designed GA is intended to solve large size problems and has very good performance. The strength and weakness of the MEC model are shown using experimental results on real data and simulation data. In the second part of this paper, to improve the MEC model for haplotype reconstruction, a new computational model is proposed, which simultaneously employs genotype information of an individual in the process of SNP correction, and is called MEC with genotype information (shortly, MEC/GI). Computational results on extensive datasets show that the new model has much higher accuracy in haplotype reconstruction than the pure MEC model. Contact: wangrsh@amss.ac.cn

145 citations

Journal ArticleDOI
TL;DR: A novel algorithm for the haplotype inference problem with the parsimony criterion is developed, based on a parsimonious tree-grow method (PTG), a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree- grow process.
Abstract: Motivation: Haplotype information has become increasingly important in analyzing fine-scale molecular genetics data, such as disease genes mapping and drug design. Parsimony haplotyping is one of haplotyping problems belonging to NP-hard class. Results: In this paper, we aim to develop a novel algorithm for the haplotype inference problem with the parsimony criterion, based on a parsimonious tree-grow method (PTG). PTG is a heuristic algorithm that can find the minimum number of distinct haplotypes based on the criterion of keeping all genotypes resolved during tree-grow process. In addition, a block-partitioning method is also proposed to improve the computational efficiency. We show that the proposed approach is not only effective with a high accuracy, but also very efficient with the computational complexity in the order of O(m2n) time for n single nucleotide polymorphism sites in m individual genotypes. Availability: The software is available upon request from the authors, or from http://zhangroup.aporc.org/bioinfo/ptg/ Contact: chen@elec.osaka-sandai.ac.jp Supplementary information: Supporting materials is available from http://zhangroup.aporc.org/bioinfo/ptg/bti572supplementary.pdf

35 citations

Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a new quantitative function for community detection in bipartite networks, and demonstrate that this quantitative function is superior to the widely used Barber's bipartitite modularity and other functions.
Abstract: Community detection in complex networks is a topic of high interest in many fields. Bipartite networks are a special type of complex networks in which nodes are decomposed into two disjoint sets, and only nodes between the two sets can be connected. Bipartite networks represent diverse interaction patterns in many real-world systems, such as predator-prey networks, plant-pollinator networks, and drug-target networks. While community detection in unipartite networks has been extensively studied in the past decade, identification of modules or communities in bipartite networks is still in its early stage. Several quantitative functions proposed for evaluating the quality of bipartite network divisions are based on null models and have distinct resolution limits. In this paper, we propose a new quantitative function for community detection in bipartite networks, and demonstrate that this quantitative function is superior to the widely used Barber's bipartite modularity and other functions. Based on the new quantitative function, the bipartite network community detection problem is formulated into an integer programming model. Bipartite networks can be partitioned into reasonable overlapping communities by maximizing the quantitative function. We further develop a heuristic and adapted label propagation algorithm (BiLPA) to optimize the quantitative function in large-scale bipartite networks. BiLPA does not require any prior knowledge about the number of communities in the networks. We apply BiLPA to both artificial networks and real-world networks and demonstrate that this method can successfully identify the community structures of bipartite networks.

32 citations

Journal ArticleDOI
TL;DR: A new quantitative function for community detection in bipartite networks is proposed and it is demonstrated that this quantitative function is superior to the widely used Barber's bipartites modularity and other functions and applies to both artificial networks and real-world networks.

27 citations

Journal ArticleDOI
TL;DR: A novel approach was proposed to exactly formulate this drug target detection problem as an integer linear programming model, which ensures that optimal solutions can be found efficiently without any heuristic manipulations and can be applied to large-scale networks including the whole metabolic networks from most organisms.
Abstract: High-throughput techniques produce massive data on a genome-wide scale which facilitate pharmaceutical research. Drug target discovery is a crucial step in the drug discovery process and also plays a vital role in therapeutics. In this study, the problem of detecting drug targets was addressed, which finds a set of enzymes whose inhibition stops the production of a given set of target compounds and meanwhile minimally eliminates non-target compounds in the context of metabolic networks. The model aims to make the side effects of drugs as small as possible and thus has practical significance of potential pharmaceutical applications. Specifically, by exploiting special features of metabolic systems, a novel approach was proposed to exactly formulate this drug target detection problem as an integer linear programming model, which ensures that optimal solutions can be found efficiently without any heuristic manipulations. To verify the effectiveness of our approach, computational experiments on both Escherichia coli and Homo sapiens metabolic pathways were conducted. The results show that our approach can identify the optimal drug targets in an exact and efficient manner. In particular, it can be applied to large-scale networks including the whole metabolic networks from most organisms.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization, are presented.
Abstract: This article reviews machine learning methods for bioinformatics. It presents modelling methods, such as supervised classification, clustering and probabilistic graphical models for knowledge discovery, as well as deterministic and stochastic heuristics for optimization. Applications in genomics, proteomics, systems biology, evolution and text mining are also shown.

805 citations

Journal ArticleDOI
15 Aug 2008
TL;DR: A novel combinatorial approach based on computing max-cuts in certain graphs derived from the sequenced fragments of a human individual to infer haplotypes and demonstrates that the haplotypes inferred using HapCUT are significantly more accurate than the greedy heuristic and a previously published method, Fast Hare.
Abstract: Motivation: The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider the haplotype assembly problem in the most general setting, i.e. fragments of any length and with an arbitrary number of gaps. Results: We describe a novel combinatorial approach for the haplotype assembly problem based on computing max-cuts in certain graphs derived from the sequenced fragments. Levy et al. have sequenced the complete genome of a human individual and used a greedy heuristic to assemble the haplotypes for this individual. We have applied our method HapCUTto infer haplotypes from this data and demonstrate that the haplotypes inferred using HapCUT are significantly more accurate (20–25% lower maximum error correction scores for all chromosomes) than the greedy heuristic and a previously published method, Fast Hare. We also describe a maximum likelihood based estimator of the absolute accuracy of the sequence-based haplotypes using population haplotypes from the International HapMap project. Availability: A program implementing HapCUT is available on

304 citations

Journal ArticleDOI
TL;DR: WhatsHap is the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs, and is demonstrated that it can handle datasets of coverage up to 20×, and that 15× are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates.
Abstract: The human genome is diploid, which requires assigning heterozygous single nucleotide polymorphisms (SNPs) to the two copies of the genome. The resulting haplotypes, lists of SNPs belonging to each copy, are crucial for downstream analyses in population genetics. Currently, statistical approaches, which are oblivious to direct read information, constitute the state-of-the-art. Haplotype assembly, which addresses phasing directly from sequencing reads, suffers from the fact that sequencing reads of the current generation are too short to serve the purposes of genome-wide phasing. While future-technology sequencing reads will contain sufficient amounts of SNPs per read for phasing, they are also likely to suffer from higher sequencing error rates. Currently, no haplotype assembly approaches exist that allow for taking both increasing read length and sequencing error information into account. Here, we suggest WhatsHap, the first approach that yields provably optimal solutions to the weighted minimum error correction problem in runtime linear in the number of SNPs. WhatsHap is a fixed parameter tractable (FPT) approach with coverage as the parameter. We demonstrate that WhatsHap can handle datasets of coverage up to 20×, and that 15× are generally enough for reliably phasing long reads, even at significantly elevated sequencing error rates. We also find that the switch and flip error rates of the haplotypes we output are favorable when comparing them with state-of-the-art statistical phasers.

290 citations

Journal ArticleDOI
TL;DR: Further development of the SOM is discussed regarding network architecture, spatio-temporal patterning, and the presentation of model results in ecological sciences.

173 citations