scispace - formally typeset
Search or ask a question

Showing papers by "Hanchuan Peng published in 2007"


Journal ArticleDOI
TL;DR: These automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.
Abstract: Staining the m RNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns. We have developed algorithms and associated software that extracts a feature representation of a gene expression pattern from an ISH image, that clusters genes sharing the same spatio-temporal pattern of expression, that suggests transcription factor binding (TFB) site motifs for genes that appear to be co-regulated (based on the clustering), and that automatically identifies the anatomical regions that express a gene given a training set of annotations. In fact, we developed three different feature representations, based on Gaussian Mixture Models (GMM), Principal Component Analysis (PCA), and wavelet functions, each having different merits with respect to the tasks above. For clustering image patterns, we developed a minimum spanning tree method (MSTCUT), and for proposing TFB sites we used standard motif finders on clustered/co-expressed genes with the added twist of requiring conservation across the genomes of 8 related fly species. Lastly, we trained a suite of binary-classifiers, one for each anatomical annotation term in a controlled vocabulary or ontology that operate on the wavelet feature representation. We report the results of applying these methods to the Berkeley Drosophila Genome Project (BDGP) gene expression database. Our automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.

74 citations


Journal ArticleDOI
TL;DR: A system to automatically annotate a fruitfly's embryonic tissue in which a gene has expression is developed, proposing to identify the wavelet embryo features by multi-resolution 2D wavelet discrete transform, followed by min-redundancy max-relevance feature selection, which yields optimal distinguishing features for an annotation.
Abstract: Motivation: Gene expression patterns obtained by in situ mRNA hybridization provide important information about different genes during Drosophila embryogenesis. So far, annotations of these images are done by manually assigning a subset of anatomy ontology terms to an image. This time-consuming process depends heavily on the consistency of experts. Results: We develop a system to automatically annotate a fruitfly's embryonic tissue in which a gene has expression. We formulate the task as an image pattern recognition problem. For a new fly embryo image, our system answers two questions: (1) Which stage range does an image belong to? (2) Which annotations should be assigned to an image? We propose to identify the wavelet embryo features by multi-resolution 2D wavelet discrete transform, followed by min-redundancy max-relevance feature selection, which yields optimal distinguishing features for an annotation. We then construct a series of parallel bi-class predictors to solve the multi-objective annotation problem since each image may correspond to multiple annotations. Supplementary information: The complete annotation prediction results are available at: http://www.cs.niu.edu/~jzhou/papers/fruitfly and http://research.janelia.org/peng/proj/fly_embryo_annotation/. The datasets used in experiments will be available upon request to the correspondence author. Contact:jzhou@cs.niu.edu and pengh@janelia.hhmi.org

63 citations


Proceedings ArticleDOI
12 Apr 2007
TL;DR: A new method based on 3D watershed algorithm to segment nuclei in 3D microscopy images is presented, which is robust to intensity fluctuation within nuclei and at the same time sensitive to the intensity and geometrical cues between nuclei.
Abstract: Automatic segmentation of nuclei in 3D microscopy images is essential for many biological studies including high throughput analysis of gene expression level, morphology, and phenotypes in single cell level. The complexity and variability of the microscopy images present many difficulties to the traditional image segmentation methods. In this paper, we present a new method based on 3D watershed algorithm to segment such images. By using both the intensity information of the image and the geometry information of the appropriately detected foreground mask, our method is robust to intensity fluctuation within nuclei and at the same time sensitive to the intensity and geometrical cues between nuclei. Besides, the method can automatically correct potential segmentation errors by using several post-processing steps. We tested this algorithm on the 3D confocal images of C.elegans, an organism that has been widely used in biological studies. Our results show that the algorithm can segment nuclei in high accuracy despite the non-uniform background, tightly clustered nuclei with different sizes and shapes, fluctuated intensities, and hollow-shaped staining patterns in the images

35 citations


Journal ArticleDOI
TL;DR: In this article, a hierarchical tree of the given cell phenotypes and calculates the statistical significance between them, based on the clustering analysis of nuclear protein distributions, was created, and the cluster histograms were constructed to show how cells in any one phenotype were distributed across the consensus clusters.
Abstract: The distribution of chromatin-associated proteins plays a key role in directing nuclear function. Previously, we developed an image-based method to quantify the nuclear distributions of proteins and showed that these distributions depended on the phenotype of human mammary epithelial cells. Here we describe a method that creates a hierarchical tree of the given cell phenotypes and calculates the statistical significance between them, based on the clustering analysis of nuclear protein distributions. Nuclear distributions of nuclear mitotic apparatus protein were previously obtained for non-neoplastic S1 and malignant T4-2 human mammary epithelial cells cultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 and the number of days in cultured. A probabilistic ensemble approach was used to define a set of consensus clusters from the results of multiple traditional cluster analysis techniques applied to the nuclear distribution data. Cluster histograms were constructed to show how cells in any one phenotype were distributed across the consensus clusters. Grouping various phenotypes allowed us to build phenotype trees and calculate the statistical difference between each group. The results showed that non-neoplastic S1 cells could be distinguished from malignant T4-2 cells with 94.19% accuracy; that proliferating S1 cells could be distinguished from differentiated S1 cells with 92.86% accuracy; and showed no significant difference between the various phenotypes of T4-2 cells corresponding to increasing tumor sizes. This work presents a cluster analysis method that can identify significant cell phenotypes, based on the nuclear distribution of specific proteins, with high accuracy.

15 citations


Journal ArticleDOI
TL;DR: This collection of research articles related to novel algorithms and enabling techniques for bio- and biomedical image analysis, mining, visualization, and biology applications is compiled.
Abstract: The 2006 International Workshop on Multiscale Biological Imaging, Data Mining and Informatics was held at Santa Barbara, on Sept 7–8, 2006. Based on the presentations at the workshop, we selected and compiled this collection of research articles related to novel algorithms and enabling techniques for bio- and biomedical image analysis, mining, visualization, and biology applications.

10 citations


Proceedings ArticleDOI
12 Apr 2007
TL;DR: A worm straightening algorithm (WSA) is developed using a cutting-plane restacking method, which aggregates the linear rotation transforms of a continuous sequence of cutting lines/planes orthogonal to the "backbone" of a worm to best approximate the nonlinearly bended worm body.
Abstract: C. elegans, a roundworm in soil is widely used in studying animal development and aging, cell differentiation, etc. Recentlv, high-resolution fluorescence images of C. elegans have become available, introducing several new image analysis applications. One problem is that worm bodies usually curve greatly in images, thus it is highly desired to straighten worms so that they can be compared easily under the same canonical coordinate system. We develop a worm straightening algorithm (WSA) using a cutting-plane restacking method, which aggregates the linear rotation transforms of a continuous sequence of cutting lines/planes orthogonal to the "backbone" of a worm to best approximate the nonlinearly bended worm body. We formulate the backbone as a parametric form of cubic spline of a series of control points. We develop two minimum-spanning-tree based methods to automatically determine the locations of control points. Our experimental methods show that our approach can effectively straighten both 2D and 3D worm images.

1 citations