scispace - formally typeset
Search or ask a question

Showing papers by "Hao Su published in 2014"


Posted Content
TL;DR: The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

519 citations


Journal ArticleDOI
TL;DR: This paper analyzes the novel concept of object bank, a high-level image representation encoding object appearance and spatial location information in images, and demonstrates that object bank is a high level representation, from which it can easily discover semantic information of unknown images.
Abstract: It is a remarkable fact that images are related to objects constituting them. In this paper, we propose to represent images by using objects appearing in them. We introduce the novel concept of object bank (OB), a high-level image representation encoding object appearance and spatial location information in images. OB represents an image based on its response to a large number of pre-trained object detectors, or `object filters', blind to the testing dataset and visual recognition task. Our OB representation demonstrates promising potential in high level image recognition tasks. It significantly outperforms traditional low level image representations in image classification on various benchmark image datasets by using simple, off-the-shelf classification algorithms such as linear SVM and logistic regression. In this paper, we analyze OB in detail, explaining our design choice of OB for achieving its best potential on different types of datasets. We demonstrate that object bank is a high level representation, from which we can easily discover semantic information of unknown images. We provide guidelines for effectively applying OB to high level image recognition tasks where it could be easily compressed for efficient computation in practice and is very robust to various classifiers.

149 citations


Journal ArticleDOI
27 Jul 2014
TL;DR: This paper considers the problem of adding depth to an image of an object, effectively 'lifting' it back to 3D, by exploiting a collection of aligned 3D models of related objects, and concludes that the network of shapes implicitly characterizes a shape-specific deformation subspace that regularizes the problem and enables robust diffusion of depth information from the shape collection to the input image.
Abstract: Images, while easy to acquire, view, publish, and share, they lack critical depth information. This poses a serious bottleneck for many image manipulation, editing, and retrieval tasks. In this paper we consider the problem of adding depth to an image of an object, effectively 'lifting' it back to 3D, by exploiting a collection of aligned 3D models of related objects. Our key insight is that, even when the imaged object is not contained in the shape collection, the network of shapes implicitly characterizes a shape-specific deformation subspace that regularizes the problem and enables robust diffusion of depth information from the shape collection to the input image. We evaluate our fully automatic approach on diverse and challenging input images, validate the results against Kinect depth readings, and demonstrate several imaging applications including depth-enhanced image editing and image relighting.

114 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe the facile preparation of a library of mono and di-functional polyhedral oligomeric silsesquioxane (POSS) building blocks with different symmetries using thiol-ene chemistry.
Abstract: The convenient synthesis of nano-building blocks with strategically placed functional groups constitutes a fundamental challenge in nano-science. Here, we describe the facile preparation of a library of mono- and di-functional (containing three isomers) polyhedral oligomeric silsesquioxane (POSS) building blocks with different symmetries (C3v, C2v, and D3d) using thiol-ene chemistry. The method is straightforward and general, possessing many advantages including minimum set-up, simple work-up, and a short reaction time (about 0.5 h). It facilitates the precise introduction of a large variety of functional groups to desired sites of the POSS cage. The yields of the monoadducts increase significantly using stoichiometric amounts of bulky ligands. Regio-selective di-functionalization of the POSS cage was also attempted using bulky thiol ligands, such as a thiol-functionalized POSS. Electrospray ionization (ESI) mass spectrometry coupled with travelling wave ion mobility (TWIM) separation revealed that the majority of diadducts are para-compounds (∼59%), although meta-compounds (∼20%) and ortho-compounds (∼21%) are also present. Therefore, the thiol-ene reaction provides a robust approach for the convenient synthesis of mono-functional POSS derivatives and, potentially, of regio-selective multi-functionalized POSS derivatives as versatile nano-building blocks.

63 citations


Journal ArticleDOI
01 Jan 2014
TL;DR: This work proposes an indexing technique, paired with an on-line reverse top-k search algorithm, that is efficient and has manageable storage requirements even when applied on very large graphs.
Abstract: With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adopted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that its direct evaluation using existing methods on RWR-based similarity search has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. Our experiments show that our technique is efficient and has manageable storage requirements even when applied on very large graphs.

35 citations


Book ChapterDOI
01 Jan 2014
TL;DR: A modular surgical system designed to facilitate the development of MRI-compatible intervention devices and Phantom and human imaging experiments validate the capability of delineating anatomical structures in 3T MRI during robot motion.
Abstract: Direct magnetic resonance imaging (MRI) guidance during surgical intervention would provide many benefits; most significantly, interventional MRI can be used for planning, monitoring of tissue deformation, realtime visualization of manipulation, and confirmation of procedure success. Direct MR guidance has not yet taken hold because it is often confounded by a number of issues including: MRI-compatibility of existing surgery equipment and patient access in the scanner bore. This paper presents a modular surgical system designed to facilitate the development of MRI-compatible intervention devices. Deep brain stimulation and prostate brachytherapy robots are the two examples that successfully deploying this surgical modules. Phantom and human imaging experiments validate the capability of delineating anatomical structures in 3T MRI during robot motion.

34 citations


Journal ArticleDOI
TL;DR: In this article, the authors report the rational design and tandem synthesis of three asymmetric giant gemini surfactants (AGGSs) of complex macromolecular structures based on polyhedral oligomeric silsesquioxane (POSS).

33 citations


Journal ArticleDOI
TL;DR: The mild condition, high efficiency, and broad functional group tolerance of thiol-Michael chemistry should further expand the scope of POSS-based giant surfactants with unparalleled possibilities for head surface chemistry manipulation, which provides numerous opportunities for nanofabrication by the direct self-assembly of giant Surfactants.

32 citations


Posted Content
TL;DR: Experimental results show that the synthesized features of this paper enable view-independent comparison between images and perform significantly better than traditional image features in this respect.
Abstract: Comparing two images in a view-invariant way has been a challenging problem in computer vision for a long time, as visual features are not stable under large view point changes. In this paper, given a single input image of an object, we synthesize new features for other views of the same object. To accomplish this, we introduce an aligned set of 3D models in the same class as the input object image. Each 3D model is represented by a set of views, and we study the correlation of image patches between different views, seeking what we call surrogates --- patches in one view whose feature content predicts well the features of a patch in another view. In particular, for each patch in the novel desired view, we seek surrogates from the observed view of the given image. For a given surrogate, we predict that surrogate using linear combination of the corresponding patches of the 3D model views, learn the coefficients, and then transfer these coefficients on a per patch basis to synthesize the features of the patch in the novel view. In this way we can create feature sets for all views of the latent object, providing us a multi-view representation of the object. View-invariant object comparisons are achieved simply by computing the $L^2$ distances between the features of corresponding views. We provide theoretical and empirical analysis of the feature synthesis process, and evaluate the proposed view-agnostic distance (VAD) in fine-grained image retrieval (100 object classes) and classification tasks. Experimental results show that our synthesized features do enable view-independent comparison between images and perform significantly better than traditional image features in this respect.

26 citations


Journal ArticleDOI
TL;DR: This study expands the library of POSS-based shape amphiphiles with numerous possibilities for head manipulations, offering an important step toward new shape Amphiphiles beyond traditional hydrophobic/hydrophilic nature for potential applications in giant molecule-based nanoscience and technology.
Abstract: Head diversification of shape amphiphiles not only broadens the scope of supramolecular engineering for new self-organizing materials but also facilitates their potential applications in high technologies. In this letter, T10 azido-functionalized polyhedral oligomeric silsesquioxane (POSS) nanoparticle was used to construct new shape amphiphiles via sequential “click” chemistry for addressing two issues: (1) new symmetry of T10 POSS head could enrich the self-assembly behaviors of shape amphiphiles, and (2) copper-catalyzed azide–alkyne cycloaddition (CuAAC)-based head functionalization strategy allows the introduction of diverse functionalities onto POSS heads, including bulky ligands (i.e., isobutyl POSS) and UV-attenuating ones (i.e., ferrocene and 4-cyano-4′-biphenyl). This study expands the library of POSS-based shape amphiphiles with numerous possibilities for head manipulations, offering an important step toward new shape amphiphiles beyond traditional hydrophobic/hydrophilic nature for potential a...

26 citations


Journal ArticleDOI
TL;DR: Three-dimensional reconstruction images of neovascularization of the soft tissues surrounding the fracture with vascular perfusion and micro-computer tomography (micro-CT) imaging indicate that stable fixation can promote longitudinal vascularity pattern formation, which tends to be similar to the natural vascularitypattern, and this benefits the inter-fragmentary blood fluid connectivity during bone healing process.

Posted Content
TL;DR: A Bayesian model is proposed to characterize the discrepancy of two samples to estimate the divergence or distance of their underlying distributions, and ushers a unified way to estimate various types of discrepancies between samples and enjoys convincing accuracy.
Abstract: A Bayesian model is proposed to characterize the discrepancy of two samples, e.g., to estimate the divergence or distance of their underlying distributions. The core idea of this framework is to learn a partition of the sample space that best captures the landscapes of their distributions. In order to avoid the pitfalls of plug-in methods that estimate each sample density independently with respect to the Lebesgue measure, we make direct inference on the two distributions simultaneously from a joint prior, i.e., the coupled binary partition prior. Our prior leverages on the class of piecewise constant functions built upon binary partitions of the domain. Our model ushers a unified way to estimate various types of discrepancies between samples and enjoys convincing accuracy. We demonstrate its effectiveness through simulations and comparisons.

Journal ArticleDOI
TL;DR: This work formulates a Bayes Net framework and introduces a quantity derived from posterior distribution to measure the convergence of crowd opinions, and empirically demonstrates the effectiveness of the designed strategy by building a challenging fine-grained image annotation task on Amazon Mechanical Turk.
Abstract: Crowdsourcing has become an important tool to aggregate the wisdom of the crowd in this Internet age. A central problem in building an online crowdsourcing system is to determine the appropriate number of workers to assign tasks to. We study this problem by formulating a Bayes Net framework and introduce a quantity derived from posterior distribution to measure the convergence of crowd opinions. Using this quantity, our algorithm could stop soliciting opinions from more workers if the distribution of opinions is unlikely to change in future predictions. We empirically demonstrate the effectiveness of the designed strategy by building a challenging fine-grained image annotation task on Amazon Mechanical Turk. Experiment results show that our approach not only saves annotation cost but also guarantees high annotation quality.

Posted Content
TL;DR: This paper proposes a bayesian model---co-BPM---to characterize the discrepancy of two sample sets, i.e., to estimate the divergence of their underlying distributions, and attempts to learn a coupled binary partition of the sample space that best captures the landscapes of both distributions.
Abstract: Divergence is not only an important mathematical concept in information theory, but also applied to machine learning problems such as low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection. We proposed a bayesian model---co-BPM---to characterize the discrepancy of two sample sets, i.e., to estimate the divergence of their underlying distributions. In order to avoid the pitfalls of plug-in methods that estimate each density independently, our bayesian model attempts to learn a coupled binary partition of the sample space that best captures the landscapes of both distributions, then make direct inference on their divergences. The prior is constructed by leveraging the sequential buildup of the coupled binary partitions and the posterior is sampled via our specialized MCMC. Our model provides a unified way to estimate various types of divergences and enjoys convincing accuracy. We demonstrate its effectiveness through simulations, comparisons with the \emph{state-of-the-art} and a real data example.

Journal ArticleDOI
TL;DR: This work proposes to a novel framework that automatically learns object groups, and uses them to build an image representation for scene recognition tasks that could achieve state-of-the-art performance for both scene discovery and scene classification tasks.
Abstract: Scene recognition is an important task for many computer vision and robotics applications. Recent progress in high-level object-based image representation has shown superior performance on scene classification tasks. In this work, we make an observation that groups of objects tend to co-occur frequently in a scene. We therefore propose to a novel framework that automatically learns object groups, and use them to build an image representation for scene recognition tasks. We model each object group as a template that explicitly encodes the spatial configurations of objects. To encourage the informativeness and discriminability, we learn the object group templates in a sparse filtering framework. Experiment results show that our object group representation could achieve state-of-the-art performance for both scene discovery and scene classification tasks.