scispace - formally typeset
Search or ask a question

Showing papers by "Jun Liu published in 2004"


Journal ArticleDOI
TL;DR: Evidence is presented that repression and activation contribute to proper morphogenesis and the logic of the program is that of a linked series of feed-forward loops which generate successive pulses of gene transcription.
Abstract: Asymmetric division during sporulation by Bacillus subtilis generates a mother cell that undergoes a 5-h program of differentiation. The program is governed by a hierarchical cascade consisting of the transcription factors: σE, σK, GerE, GerR, and SpoIIID. The program consists of the activation and repression of 383 genes. The σE factor turns on 262 genes, including those for GerR and SpoIIID. These DNA-binding proteins downregulate almost half of the genes in the σE regulon. In addition, SpoIIID turns on ten genes, including genes involved in the appearance of σK. Next, σK activates 75 additional genes, including that for GerE. This DNA-binding protein, in turn, represses half of the genes that had been activated by σK while switching on a final set of 36 genes. Evidence is presented that repression and activation contribute to proper morphogenesis. The program of gene expression is driven forward by its hierarchical organization and by the repressive effects of the DNA-binding proteins. The logic of the program is that of a linked series of feed-forward loops, which generate successive pulses of gene transcription. Similar regulatory circuits could be a common feature of other systems of cellular differentiation.

359 citations


Journal ArticleDOI
TL;DR: It is found that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection and the power of association studies based on tag SNPs using genotype data is similar to that using haplotype data.
Abstract: Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based on genotype data by combining a dynamic programming algorithm for haplotype block partitioning and tag SNP selection based on haplotype data with a variation of the expectation maximization (EM) algorithm for haplotype inference. We assess the effects of using either haplotype or genotype data in haplotype block identification and tag SNP selection as a function of several factors, including sample size, density or number of SNPs studied, allele frequencies, fraction of missing data, and genotyping error rate, using extensive simulations. We find that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection. The power of association studies based on tag SNPs using genotype data is similar to that using haplotype data.

182 citations


Journal ArticleDOI
Qing Zhou1, Jun Liu1
TL;DR: This work extends the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space and shows that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms.
Abstract: Motivation: The position-specific weight matrix (PWM) model, which assumes that each position in the DNA site contributes independently to the overall protein--DNA interaction, has been the primary means to describe transcription factor binding site motifs. Recent biological experiments, however, suggest that there exists interdependence among positions in the binding sites. In order to exploit this interdependence to aid motif discovery, we extend the PWM model to include pairs of correlated positions and design a Markov chain Monte Carlo algorithm to sample in the model space. We then combine the model sampling step with the Gibbs sampling framework for de novo motif discoveries. Results: Testing on experimentally validated binding sites, we find that about 25% of the transcription factor binding motifs show significant within-site position correlations, and 80% of these motif models can be improved by considering the correlated positions. Using both simulated data and real promoter sequences, we show that the new de novo motif-finding algorithm can infer the true correlated position pairs accurately and is more precise in finding putative transcription factor binding sites than the standard Gibbs sampling algorithms. Availability: The program is available at http://www.people.fas.harvard.edu/~junliu/

167 citations


Journal ArticleDOI
TL;DR: A novel algorithm, based on Gibbs sampling, is presented, which locates, de novo, the cis features of these CRMs, their component TFBSs, and the properties of their spatial distribution, and demonstrates the applicability of the method to genome-scale data.
Abstract: Technologies for large-scale assessment of gene expression have become a mainstay of the postgenome era. Such profiling studies in yeast have been analyzed to gain insights into the regulatory program of this organism (Segal et al. 2003). Unfortunately, however, application of profiling technologies in higher eukaryotes all too often yields little more than a laundry list of genes that are differentially expressed along with speculation about their potential common functions. A greater focus on mechanistic connections would be useful to address this deficiency, but the means to identify these are currently limited. Some progress towards this end has been achieved when prior models of the binding patterns of cognate transcription factors are known. Progress has been more limited when such patterns are not available. Here we describe a two-step procedure that identifies cis-regulatory modules (CRMs) de novo, and uses the resulting models as the basis of a discriminant procedure to identify additional genes in the regulon. The CRM can be viewed as a circuit translating input signals from diverse pathways into an output, gene activity, through the binding of multiple transcription factors in a combinatorial fashion. Though regulatory circuits can be defined through extensive laboratory effort, most tissues and contexts are insufficiently characterized to allow such approaches. Although pattern discovery techniques have proven effective in the identification of transcription factor binding sites (TFBSs) for several single-celled organisms (McCue et al. 2000, 2002; Rajewsky et al. 2002a), successful applications in higher eukaryotes have been sparse and only partially effective (Aerts et al. 2003). Transcription factors can tolerate widely varying target sequences, resulting in computational binding profiles of low specificity. Such weak patterns become impossible to distinguish when regulatory regions are embedded within long candidate regions. Cross-species comparison of sequences from orthologous genes, or phylogenetic footprinting, shortens the amount of sequence under consideration by focusing attention on conserved regions that are more likely to serve a biological function (Wasserman et al. 2000; Boffelli et al. 2003). Although such methods can increase binding-site densities by fivefold, only the strongest sites are detected at this level (Wasserman et al. 2000). Recently, based on the synergy arising from clusters of TFBSs with known binding patterns, a variety of computational methods have been created for the discrimination of CRMs. These include composite site models and statistical models of TFBSs (Wasserman and Krivan 2003). It is often the case that no prior information exists on binding patterns of any relevant transcription factors for sets of genes identified in large-scale expression studies. One approach is a method for identification of modules using known motifs, but it includes a preliminary step of motif identification using either a Gibbs sampling algorithm or an algorithm based on overrepresented oligonucleotide sequences (Rajewsky et al. 2002b). Another approach uses suffix-trees and a word consensus approach rather than a statistical model to locate ordered collections of motifs (Marsan and Sagot 2000). In this method, sites of each motif type are assumed to occur exactly once in each module. An expectation-maximization algorithm based on a discriminant model with multiple iterative optimization steps has also been described (Segal and Sharan 2004). Although these approaches are promising, computational identification of CRMs and TFBSs without prior knowledge of binding patterns remains elusive. Protein interactions provide the mechanistic basis for much of gene regulation in all organisms (Wei et al. 2004). The activity of a particular transcription factor (TF) cannot be considered in isolation. Often, a particular TF can be stimulated either positively or negatively by its interaction with either cis-binding factors or coactivators (Latchman 1998). There is considerable evidence that cis-elements occur in clusters, in which the weak individual signals provide a collectively strong signal (Frith et al. 2002). For example, it has been shown that for proper spatial expression in the endoderm of the sea urchin, one particular pairing of Gata sites is essential and that these function synergistically with an adjacent Otx site (Yuh et al. 2004). To model modules of cis-elements, one must determine the essential spatial and ordering properties. Because synergy-based discrimination functions surpass the performance of models for individual TFBSs (Halfon and Michelson 2002), it is reasonable to expect that de novo pattern discovery methods based on regulatory modules will perform better than methods for detection of individual motifs. In the present study, we developed a synergy-based de novo algorithm that models neighbor interactions among TFBSs. We also explored the utility of using aligned human-mouse sequences as an input data set for training the algorithm. We found that the use of aligned human-mouse sequences and the use of neighboring interactions both enhance the specificity of site and module predictions. We show that this model can be used to specifically discriminate regulatory sequences from control sequence in an independent test set, and we use the resulting discrimination procedure to predict additional genes that are likely to be regulated in a manner similar to those in the study set.

116 citations


Journal ArticleDOI
TL;DR: The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria.
Abstract: Motivation: Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. Results: We derive a comprehensive scoring function based on a full Bayesian model that can handle unknown site abundance, unknown motif width and two-block motifs with variable-length gaps. An algorithm called BioOptimizer is proposed to optimize this scoring function so as to reduce noise in the motif signal found by any motif-finding program. The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria. In addition, this scoring function formulation enables us to compare objectively different predicted motifs and select the optimal ones, effectively combining the strengths of existing programs. Availability: BioOptimizer is available for download at www.fas.harvard.edu/~junliu/BioOptimizer/

103 citations


Journal ArticleDOI
TL;DR: Two Poisson-based distances are developed and shown to be more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.
Abstract: Serial analysis of gene expression (SAGE) data have been poorly exploited by clustering analysis owing to the lack of appropriate statistical methods that consider their specific properties. We modeled SAGE data by Poisson statistics and developed two Poisson-based distances. Their application to simulated and experimental mouse retina data show that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.

99 citations


Journal ArticleDOI
TL;DR: It is observed that scoring functions resulting from proper posterior distributions, or approximations to such distributions, showed the best performance and can be used to improve upon existing motif-finding programs.
Abstract: The Bayesian approach together with Markov chain Monte Carlo techniques has provided an attractive solution to many important bioinformatics problems such as multiple sequence alignment, microarray analysis and the discovery of gene regulatory binding motifs. The employment of such methods and, more broadly, explicit statistical modeling, has revolutionized the field of computational biology. After reviewing several heuristics-based computational methods, this article presents a systematic account of Bayesian formulations and solutions to the motif discovery problem. Generalizations are made to further enhance the Bayesian approach. Motivated by the need of a speedy algorithm, we also provide a perspective of the problem from the viewpoint of optimizing a scoring function. We observe that scoring functions resulting from proper posterior distributions, or approximations to such distributions, showed the best performance and can be used to improve upon existing motif-finding programs. Simulation analyses and a real-data example are used to support our observation.

98 citations


Journal ArticleDOI
TL;DR: Gene ontology annotation analysis shows that many of the 822 genes were involved in important cell cycle-related processes, functions and components and may have brought cells to the same phase at the time of release.
Abstract: We propose a periodic-normal mixture (PNM) model to fit transcription profiles of periodically expressed (PE) genes in cell cycle microarray experiments. The model leads to a principled statistical estimation procedure that produces more accurate estimates of the mean cell cycle length and the gene expression periodicity than existing heuristic approaches. A central component of the proposed procedure is the resynchronization of the observed transcription profile of each PE gene according to the PNM with estimated periodicity parameters. By using a two-component mixture-Beta model to approximate the PNM fitting residuals, we employ an empirical Bayes method to detect PE genes. We estimate that about one-third of the genes in the genome of Saccharomyces cerevisiae are likely to be transcribed periodically, and identify 822 genes whose posterior probabilities of being PE are greater than 0.95. Among these 822 genes, 540 are also in the list of 800 genes detected by Spellman. Gene ontology annotation analysis shows that many of the 822 genes were involved in important cell cycle-related processes, functions and components. When matching the 822 resynchronized expression profiles of three independent experiments, little phase shifts were observed, indicating that the three synchronization methods might have brought cells to the same phase at the time of release.

75 citations


Journal ArticleDOI
TL;DR: A novel approach to discovery of new biomarkers for early detection of liver cirrhosis and classification of liver diseases by using a three‐step approach using surface‐enhanced laser desorption/ionization technology.
Abstract: Liver cirrhosis is a worldwide health problem. Reliable, noninvasive methods for early detection of liver cirrhosis are not availabe. Using a three-step approach, we classified sera from rats with liver cirrhosis following different treatment insults. The approach consisted of: (i) protein profiling using surface-enhanced laser desorption/ionization (SELDI) technology; (ii) selection of a statistically significant serum biomarker set using machine learning algorithms; and (iii) identification of selected serum biomarkers by peptide sequencing. We generated serum protein profiles from three groups of rats: (i) normal (n = 8), (ii) thioacetamide-induced liver cirrhosis (n = 22), and (iii) bile duct ligation-induced liver fibrosis (n = 5) using a weak cation exchanger surface. Profiling data were further analyzed by a recursive support vector machine algorithm to select a panel of statistically significant biomarkers for class prediction. Sensitivity and specificity of classification using the selected protein marker set were higher than 92%. A consistently down-regulated 3495 Da protein in cirrhosis samples was one of the selected significant biomarkers. This 3495 Da protein was purified on-chip and trypsin digested. Further structural characterization of this biomarkers candidate was done by using cross-platform matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) peptide mass fingerprinting (PMF) and matrix-assisted laser desorption/ionization time of flight/time of flight (MALDI-TOF/TOF) tandem mass spectrometry (MS/MS). Combined data from PMF and MS/MS spectra of two tryptic peptides suggested that this 3495 Da protein shared homology to a histidine-rich glycoprotein. These results demonstrated a novel approach to discovery of new biomarkers for early detection of liver cirrhosis and classification of liver diseases.

52 citations


Journal ArticleDOI
TL;DR: A p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs and introduces the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts.
Abstract: High-level eukaryotic genomes present a particular challenge to the computational identification of transcription factor binding sites (TFBSs) because of their long noncoding regions and large numbers of repeat elements. This is evidenced by the noisy results generated by most current methods. In this paper, we present a p-value-based scoring scheme using probability generating functions to evaluate the statistical significance of potential TFBSs. Furthermore, we introduce the local genomic context into the model so that candidate sites are evaluated based both on their similarities to known binding sites and on their contrasts against their respective local genomic contexts. We demonstrate that our approach is advantageous in the prediction of myogenin and MEF2 binding sites in the human genome. We also apply LMM to large-scale human binding site sequences in situ and found that, compared to current popular methods, LMM analysis can reduce false positive errors by more than 50% without compromising sensi...

51 citations


Journal ArticleDOI
TL;DR: A novel genotype clustering algorithm, GeneScore, is proposed, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters, and an expectation-maximization algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices as inputs.
Abstract: The accuracy of the vast amount of genotypic information generated by high-throughput genotyping technologies is crucial in haplotype analyses and linkage-disequilibrium mapping for complex diseases. To date, most automated programs lack quality measures for the allele calls; therefore, human interventions, which are both labor intensive and error prone, have to be performed. Here, we propose a novel genotype clustering algorithm, GeneScore, based on a bivariate t-mixture model, which assigns a set of probabilities for each data point belonging to the candidate genotype clusters. Furthermore, we describe an expectation-maximization (EM) algorithm for haplotype phasing, GenoSpectrum (GS)-EM, which can use probabilistic multilocus genotype matrices (called “GenoSpectrum”) as inputs. Combining these two model-based algorithms, we can perform haplotype inference directly on raw readouts from a genotyping machine, such as the TaqMan assay. By using both simulated and real data sets, we demonstrate the advantages of our probabilistic approach over the current genotype scoring methods, in terms of both the accuracy of haplotype inference and the statistical power of haplotype-based association analyses.

Journal ArticleDOI
TL;DR: A comprehensive MIAME-supportive infrastructure for gene expression data management and makes extensive use of ontologies is provided, enabling a large variety of queries that incorporate visualization and analysis tools and have been tailored to serve the specific needs of projects focusing on particular organisms or biological systems.
Abstract: Motivation: Gene expression array technology has become increasingly widespread among researchers who recognize its numerous promises. At the same time, bench biologists and bioinformaticians have come to appreciate increasingly the importance of establishing a collaborative dialog from the onset of a study and of collecting and exchanging detailed information on the many experimental and computational procedures using a structured mechanism. This is crucial for adequate analyses of this kind of data. Results: The RNA Abundance Database (RAD; http://www.cbil.upenn.edu/RAD) provides a comprehensive MIAME-supportive infrastructure for gene expression data management and makes extensive use of ontologies. Specific details on protocols, biomaterials, study designs, etc. are collected through a user-friendly suite of web annotation forms. Software has been developed to generate MAGE-ML documents to enable easy export of studies stored in RAD to any other database accepting data in this format (e.g. ArrayExpress). RAD is part of a more general Genomics Unified Schema (http://www.gusdb.org), which includes a richly annotated gene index (http://www.allgenes.org), thus providing a platform that integrates genomic and transcriptomic data from multiple organisms. This infrastructure enables a large variety of queries that incorporate visualization and analysis tools and have been tailored to serve the specific needs of projects focusing on particular organisms or biological systems. Availability: The system is freely available. Information on how to obtain it and how to install it can be found at http://www.cbil.upenn.edu/RAD/RAD-installation.htm

Journal ArticleDOI
TL;DR: A suite of programs that are developed to search for regulatory sequence motifs, a Gibbs-sampling-based program for predicting regulatory motifs from co-regulated genes in prokaryotes or lower eukaryotes, and an extension to BioProspector which incorporates comparative genomics features to be used for higher eUKaryotes are presented.
Abstract: The identification of regulatory motifs is important for the study of gene expression. Here we present a suite of programs that we have developed to search for regulatory sequence motifs: (i) BioProspector, a Gibbs-sampling-based program for predicting regulatory motifs from co-regulated genes in prokaryotes or lower eukaryotes; (ii) CompareProspector, an extension to BioProspector which incorporates comparative genomics features to be used for higher eukaryotes; (iii) MDscan, a program for finding protein-DNA interaction sites from ChIP-on-chip targets. All three programs examine a group of sequences that may share common regulatory motifs and output a list of putative motifs as position-specific probability matrices, the individual sites used to construct the motifs and the location of each site on the input sequences. The web servers and executables can be accessed at http://seqmotifs.stanford.edu.

Journal ArticleDOI
TL;DR: While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of selective constraints.
Abstract: Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called c ontrast h ierarchical a lignment and i nteraction n etwork (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giαsubunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of selective constraints. In some instances, these new approaches also provide a better understanding of family-specific constraints, as we illustrate for p97 ATPases. Programs implementing these procedures and supplementary information are available from the authors.

Journal ArticleDOI
TL;DR: This paper proposes a simple inference procedure via the importance sampling technique, which provides a consistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates.
Abstract: SUMMARY When the estimating function for a vector of parameters is not smooth, it is often rather difficult, if not impossible, to obtain a consistent estimator by solving the corresponding estimating equation using standard numerical techniques. In this paper, we propose a simple inference procedure via the importance sampling technique, which provides a con sistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates. The new proposal is illustrated and evaluated via two extensive examples with real and simulated datasets.

Journal ArticleDOI
TL;DR: A hierarchical Bayes model for determining genes in the sporulation pathway of Bacillus subtilis is presented and 181 genes that had not been previously described as controlled by σE are found.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: This chapter will illustrate how generic strategies based on the sequential buildup strategy and the resampling method are applied to various application problems.
Abstract: The previous chapter outlines a general Monte Carlo framework based on the sequential buildup strategy. Several essential elements are (a) the choice of the trial densities, (b) the resampling method, (c) the marginalization strategy, and (d) the rejection control. This chapter will illustrate how these generic strategies are applied to various application problems.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: Molecular dynamics simulation is a deterministic procedure to integrate the equations of motion based on the classical mechanics principles (Hamiltonian equations) and has become one of the most widely used research tools for complex physical systems.
Abstract: Molecular dynamics (MD) simulation is a deterministic procedure to integrate the equations of motion based on the classical mechanics principles (Hamiltonian equations). This method was first proposed by Alder and Wainwright (1959) and has become one of the most widely used research tools for complex physical systems. In a typical MD simulation study, one first sets up the quantitative system (model) of interest under a given condition (e.g., fixed number of particles and constant total energy). Then, successive configurations of the system, as a function of time, are generated by following Newton’s laws of motion. After a period of time for “equilibration,” one can start to collect “data” from this computer experiment — the data consist of a sequence of snapshots that record the positions and velocities of the particles in the system during a period of time. Based on these records, one can estimate “typical characteristics,” which can often be expressed as the time average of a function of the realized configurations, of the simulated physical system.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: The proposal transition T(x,y) in a Metropolis sampler is often an arbitrary choice out of convenience but in many applications, the proposal is chosen to be a locally uniform move.
Abstract: The proposal transition T(x,y) in a Metropolis sampler is often an arbitrary choice out of convenience. In many applications, the proposal is chosen to be a locally uniform move. In fact, the use of symmetric and locally uniform proposals is so prevailing that these are often referred to as “unbiased proposals” in the literature.

Journal ArticleDOI
TL;DR: A Bayesian method using Markov chain Monte Carlo techniques to compute all posterior quantities of interest and allow inferences to be made regarding the number of segment types and the order of Markov dependence in the DNA sequence is described.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: In this paper, the authors have discussed the importance of Monte Carlo methods in evaluating integrals and simulating stochastic systems, and the most critical step in developing an efficient Monte Carlo algorithm is the simulation (sampling) from an appropriate probability distribution π(x).
Abstract: We have discussed in the previous chapters the important role of Monte Carlo methods in evaluating integrals and simulating stochastic systems. The most critical step in developing an efficient Monte Carlo algorithm is the simulation (sampling) from an appropriate probability distribution π(x). When directly generating independent samples from π(x) is not possible, we have to either opt for an importance sampling strategy, in which random samples are generated from a trial distribution different from (but close to) the target one and then weighted according to the importance ratio; or produce statistically dependent samples based on the idea of Markov chain Monte Carlo sampling.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: In the previous chapter, this chapter introduced the basic framework of sequential importance sampling (SIS), in which one builds up the trial sampling distribution sequentially and computes the importance weights recursively.
Abstract: In the previous chapter, we introduced the basic framework of sequential importance sampling (SIS), in which one builds up the trial sampling distribution sequentially and computes the importance weights recursively.

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: In this paper, an approximation to I can be obtained by drawing independent and identically distributed (i.i.d.) random samples from a region in a high-dimensional space and g(x) is the target function of interest.
Abstract: An essential part of many scientific problems is the computation of integral $$I = \int_D {g\left( x \right)} {\kern 1pt} dx$$ (1) , where D is often a region in a high-dimensional space and g(x) is the target function of interest. If we can draw independent and identically distributed (i.i.d.) random samples x(1) ... , x(m) uniformly from D (by a computer), an approximation to I can be obtained as $$ {\hat I_m} = \frac{1}{m}\left\{ {g\left( {{x^{\left( 1 \right)}}} \right) + \cdots + g\left( {{x^{\left( m \right)}}} \right)} \right\}. $$

Posted Content
TL;DR: This paper proposed a simple inference procedure via the importance sampling technique, which provides a consistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates.
Abstract: When the estimating function for a vector of parameters is not smooth, it is often rather difficult, if not impossible, to obtain a consistent estimator by solving the corresponding estimating equation using standard numerical techniques. In this paper, we propose a simple inference procedure via the importance sampling technique, which provides a consistent root of the estimating equation and also an approximation to its distribution without solving any equations or involving nonparametric function estimates. The new proposal is illustrated and evaluated via two extensive examples with real and simulated datasets. Copyright 2004, Oxford University Press.(This abstract was borrowed from another version of this item.)

Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: In Section 1.3, this paper introduced the Ising model, which is used by physicists to model the magnetization phenomenon and has been studied extensively in statistical physics literature, and a closely related model is the Potts model.
Abstract: In Section 1.3, we introduced the Ising model, which is used by physicists to model the magnetization phenomenon and has been studied extensively in statistical physics literature. A closely related model is the Potts model.


Book ChapterDOI
Jun Liu1
01 Jan 2004
TL;DR: In this chapter, a few innovative ideas in using auxiliary distributions and multiple Markov chains (in parallel) to improve the efficiency of Monte Carlo simulations are described.
Abstract: In this chapter, we describe a few innovative ideas in using auxiliary distributions and multiple Markov chains (in parallel) to improve the efficiency of Monte Carlo simulations. Roughly speaking, in order to improve the mixing property of an underlying Monte Carlo Markov chain, one can build a few “companion chains” whose sole purpose is to help bridging parts of the sample space that are separated by very high energy (or low probability) barriers in the original distribution.