scispace - formally typeset
Search or ask a question
Author

Song-Chun Zhu

Other affiliations: Harvard University, Ohio State University, Peking University  ...read more
Bio: Song-Chun Zhu is an academic researcher from University of California, Los Angeles. The author has contributed to research in topics: Graph (abstract data type) & Parsing. The author has an hindex of 78, co-authored 545 publications receiving 24141 citations. Previous affiliations of Song-Chun Zhu include Harvard University & Ohio State University.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel statistical and variational approach to image segmentation based on a new algorithm, named region competition, derived by minimizing a generalized Bayes/minimum description length (MDL) criterion using the variational principle is presented.
Abstract: We present a novel statistical and variational approach to image segmentation based on a new algorithm, named region competition. This algorithm is derived by minimizing a generalized Bayes/minimum description length (MDL) criterion using the variational principle. The algorithm is guaranteed to converge to a local minimum and combines aspects of snakes/balloons and region growing. The classic snakes/balloons and region growing algorithms can be directly derived from our approach. We provide theoretical analysis of region competition including accuracy of boundary location, criteria for initial conditions, and the relationship to edge detection using filters. It is straightforward to generalize the algorithm to multiband segmentation and we demonstrate it on gray level images, color images and texture images. The novel color model allows us to eliminate intensity gradients and shadows, thereby obtaining segmentation based on the albedos of objects. It also helps detect highlight regions.

2,181 citations

Journal ArticleDOI
TL;DR: The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling.
Abstract: This article presents a statistical theory for texture modeling. This theory combines filtering theory and Markov random field modeling through the maximum entropy principle, and interprets and clarifies many previous concepts and methods for texture analysis and synthesis from a unified point of view. Our theory characterizes the ensemble of images I with the same texture appearance by a probability distribution f(I) on a random field, and the objective of texture modeling is to make inference about f(I), given a set of observed texture examples.In our theory, texture modeling consists of two steps. (1) A set of filters is selected from a general filter bank to capture features of the texture, these filters are applied to observed texture images, and the histograms of the filtered images are extracted. These histograms are estimates of the marginal distributions of f( I). This step is called feature extraction. (2) The maximum entropy principle is employed to derive a distribution p(I), which is restricted to have the same marginal distributions as those in (1). This p(I) is considered as an estimate of f( I). This step is called feature fusion. A stepwise algorithm is proposed to choose filters from a general filter bank. The resulting model, called FRAME (Filters, Random fields And Maximum Entropy), is a Markov random field (MRF) model, but with a much enriched vocabulary and hence much stronger descriptive ability than the previous MRF models used for texture modeling. Gibbs sampler is adopted to synthesize texture images by drawing typical samples from p(I), thus the model is verified by seeing whether the synthesized texture images have similar visual appearances to the texture images being modeled. Experiments on a variety of 1D and 2D textures are described to illustrate our theory and to show the performance of our algorithms. These experiments demonstrate that many textures which are previously considered as from different categories can be modeled and synthesized in a common framework.

746 citations

Journal ArticleDOI
TL;DR: The DDMCMC paradigm provides a unifying framework in which the role of many existing segmentation algorithms are revealed as either realizing Markov chain dynamics or computing importance proposal probabilities and generalizes these segmentation methods in a principled way.
Abstract: This paper presents a computational paradigm called Data-Driven Markov Chain Monte Carlo (DDMCMC) for image segmentation in the Bayesian statistical framework. The paper contributes to image segmentation in four aspects. First, it designs efficient and well-balanced Markov Chain dynamics to explore the complex solution space and, thus, achieves a nearly global optimal solution independent of initial segmentations. Second, it presents a mathematical principle and a K-adventurers algorithm for computing multiple distinct solutions from the Markov chain sequence and, thus, it incorporates intrinsic ambiguities in image segmentation. Third, it utilizes data-driven (bottom-up) techniques, such as clustering and edge detection, to compute importance proposal probabilities, which drive the Markov chain dynamics and achieve tremendous speedup in comparison to the traditional jump-diffusion methods. Fourth, the DDMCMC paradigm provides a unifying framework in which the role of many existing segmentation algorithms, such as, edge detection, clustering, region growing, split-merge, snake/balloon, and region competition, are revealed as either realizing Markov chain dynamics or computing importance proposal probabilities. Thus, the DDMCMC paradigm combines and generalizes these segmentation methods in a principled way. The DDMCMC paradigm adopts seven parametric and nonparametric image models for intensity and color at various regions. We test the DDMCMC paradigm extensively on both color and gray-level images and some results are reported in this paper.

638 citations

Journal ArticleDOI
TL;DR: In this paper, the authors review recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations, and discuss prospective trends in explainable artificial intelligence.
Abstract: This paper reviews recent studies in understanding neural-network representations and learning neural networks with interpretable/disentangled middle-layer representations. Although deep neural networks have exhibited superior performance in various tasks, interpretability is always Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of a low interpretability of their black-box representations. We believe that high model interpretability may help people break several bottlenecks of deep learning, e.g., learning from a few annotations, learning via human–computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.

576 citations

Journal ArticleDOI
TL;DR: Some recent results in statistical modeling of natural images that attempt to explain patterns of non-Gaussian behavior of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps are reviewed.
Abstract: Statistical analysis of images reveals two interesting properties: (i) invariance of image statistics to scaling of images, and (ii) non-Gaussian behavior of image statistics, i.e. high kurtosis, heavy tails, and sharp central cusps. In this paper we review some recent results in statistical modeling of natural images that attempt to explain these patterns. Two categories of results are considered: (i) studies of probability models of images or image decompositions (such as Fourier or wavelet decompositions), and (ii) discoveries of underlying image manifolds while restricting to natural images. Applications of these models in areas such as texture analysis, image classification, compression, and denoising are also considered.

561 citations


Cited by
More filters
Proceedings ArticleDOI
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1, Kai Li1, Li Fei-Fei1 
20 Jun 2009
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

49,639 citations

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Journal ArticleDOI
TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Abstract: The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation procedures. Organised annually from 2005 to present, the challenge and its associated dataset has become accepted as the benchmark for object detection. This paper describes the dataset and evaluation procedure. We review the state-of-the-art in evaluated methods for both classification and detection, analyse whether the methods are statistically different, what they are learning from the images (e.g. the object or its context), and what the methods find easy or confuse. The paper concludes with lessons learnt in the three year history of the challenge, and proposes directions for future improvement and extension.

15,935 citations