scispace - formally typeset
Search or ask a question

Showing papers presented at "International Conference on Bioinformatics in 2002"


Proceedings Article
01 Aug 2002
TL;DR: It is revealed that gene harvesting, without additional constraints, can yield artifactual solutions and is evaluated using a microarray-based study of cardiomyopathy in transgenic mice.
Abstract: A variety of new procedures have been devised to handle the two sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available, and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarraybased study of cardiomyopathy in transgenic mice.

176 citations


Proceedings Article
01 Jan 2002

90 citations


Proceedings Article
01 Jan 2002
TL;DR: The method of spot intensity ratio computation is derived from the biochemical model of differential gene expression experiments, which yields comparable or even more accurate results than standard methods under poor hybridisation and scan quality conditions.
Abstract: Parallel expression analysis of many genes by microarray hybridisation is one of the most promising techniques in functional genomics. The method has been successfully applied many times in medical and biological research. Our work is about automatic methods for the first stages of a microarray data analysis pipeline. Expression analysis by microarray hybridisation is a high throughput technique. While interactive, semi-automatic software is still frequently used for the analysis of scanned array images, it is highly desirable to have automatic procedures which yield better repeatability and constant quality of the expression data for later cluster analyses. Automatic methods must handle noise and the frequently occurring contaminations on microarrays. In large scale microarray experiments, automatic image analysis can save substantial amounts of work. We describe robust image processing methods that find the printed grids of spots in the scanned microarray images without the requirement of special guide spots or specially calibrated equipment. Processing of many slides from the same print batch helps to minimize the need for human intervention. We derive our method of spot intensity ratio computation from the biochemical model of differential gene expression experiments and finally discuss how different ratio computation methods can be compared. We compare results of our method to results of manual analyses using the well-known Scanalyze (M. Eisen, LBNL Berkeley) as well as recently published methods (Brown et al. (2001), PNAS 92, 8944-8949). Our automatic method yields comparable or even more accurate results than standard methods under poor hybridisation and scan quality conditions.

14 citations


Proceedings Article
01 Jan 2002

4 citations



Proceedings Article
01 Jan 2002

2 citations


Proceedings Article
01 Jan 2002

2 citations


Proceedings Article
01 Jun 2002

2 citations


Proceedings Article
01 Jan 2002
TL;DR: In this article, a simple approximation to the electrostatic potential in solution by reparametrising the partial atomic charges in such a way that a simple Coulomb potential can still be used was proposed.
Abstract: The rate constant of an enzyme-catalysed reaction is one of the major target properties to understand protein function. Atomic-detail computer simulations can in principle be used to estimate rate constants from the energy profile along the reaction coordinate. For such simulations, molecular mechanics is combined with a quantum description of the reaction process. In molecular mechanics calculations, the electrostatic field is represented by the Coulomb potential of partial atomic charges which have been parametrised for small building blocks in vacuum and transferred to the macromolecule. In aqueous solution, however, the electrostatic interactions are affected by the solvent polarization. While this can be described by numerically solving the Poisson-Boltzmann equation, it is computationally expensive. A simple approximation to this is to optimally reproduce the electrostatic potential in solution by reparametrising the partial atomic charges in such a way that a simple Coulomb potential can still be used. Such a procedure would allow to perform fast calculations of reaction processes in proteins while accounting for the solvent screening effect. Here, this method is tested on myosin, a motor protein that is both an enzyme and exists in very different conformations.

1 citations



Proceedings Article
11 Oct 2002
TL;DR: Formulae for various measures of the expected progress such as expected number and size of scaffolds are derived and assessed by Monte Carlo simulations for parameter sets used in the human genome project.
Abstract: Paired-end shotgun sequencing has become widely used for large-scale sequencing projects in recent years, including whole genome shot-gun sequencing and map-based BAC clone sequencing. Under this scheme, sequences from both ends of random clones are determined and assembled into sequence contigs. The sequence data and their linking information are used to construct clone maps in the form of scaffolds. In order to plan a cost-effective sequencing project utilizing such an approach, it is crucial to have knowledge of the expected project progress in relation to parameters such as insert size, clone lengths and redundancy. There has been a lack of theoretical analysis for the paired-end sequencing strategy due to the difficulty of correlated ends. Here we present a mathematical analysis for the progress of a sequencing project employing such a scheme. Formulae for various measures of the expected progress such as expected number and size of scaffolds are derived and assessed by Monte Carlo simulations for parameter sets used in the human genome project.

Proceedings Article
01 Jan 2002