Other affiliations: University of California, Berkeley, University of California, University of Texas at Austin
Bio: Yuan-Fang Wang is an academic researcher from University of California, Santa Barbara. The author has contributed to research in topics: Reinforcement learning & Closed captioning. The author has an hindex of 37, co-authored 123 publications receiving 4801 citations. Previous affiliations of Yuan-Fang Wang include University of California, Berkeley & University of California.
Papers published on a yearly basis
••13 Jun 2004
TL;DR: This work demonstrates the flexibility and effectiveness of using the Kalman Filter as a solution for managing trade-offs between precision of results and resources in satisfying stream queries.
Abstract: To answer user queries efficiently, a stream management system must handle continuous, high-volume, possibly noisy, and time-varying data streams. One major research area in stream management seeks to allocate resources (such as network bandwidth and memory) to query plans, either to minimize resource usage under a precision requirement, or to maximize precision of results under resource constraints. To date, many solutions have been proposed; however, most solutions are ad hoc with hard-coded heuristics to generate query plans. In contrast, we perceive stream resource management as fundamentally a filtering problem, in which the objective is to filter out as much data as possible to conserve resources, provided that the precision standards can be met. We select the Kalman Filter as a general and adaptive filtering solution for conserving resources. The Kalman Filter has the ability to adapt to various stream characteristics, sensor noise, and time variance. Furthermore, we realize a significant performance boost by switching from traditional methods of caching static data (which can soon become stale) to our method of caching dynamic procedures that can predict data reliably at the server without the clients' involvement. In this work we focus on minimization of communication overhead for both synthetic and real-world streams. Through examples and empirical studies, we demonstrate the flexibility and effectiveness of using the Kalman Filter as a solution for managing trade-offs between precision of results and resources in satisfying stream queries.
15 Jun 2019
TL;DR: In this paper, a reinforcement learning-based approach is proposed to enforce cross-modal grounding both locally and globally via reinforcement learning (RL), where a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories.
Abstract: Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. In this paper, we study how to address three critical challenges for this task: the cross-modal grounding, the ill-posed feedback, and the generalization problems. First, we propose a novel Reinforced Cross-Modal Matching (RCM) approach that enforces cross-modal grounding both locally and globally via reinforcement learning (RL). Particularly, a matching critic is used to provide an intrinsic reward to encourage global matching between instructions and trajectories, and a reasoning navigator is employed to perform cross-modal grounding in the local visual scene. Evaluation on a VLN benchmark dataset shows that our RCM model significantly outperforms previous methods by 10% on SPL and achieves the new state-of-the-art performance. To improve the generalizability of the learned policy, we further introduce a Self-Supervised Imitation Learning (SIL) method to explore unseen environments by imitating its own past, good decisions. We demonstrate that SIL can approximate a better and more efficient policy, which tremendously minimizes the success rate performance gap between seen and unseen environments (from 30.7% to 11.7%).
••01 Oct 2019
TL;DR: This work presents a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese and demonstrates that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation.
Abstract: We present a new large-scale multilingual video description dataset, VATEX, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSR-VTT dataset, \vatex is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on \vatex: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context. Extensive experiments on the \vatex dataset show that, first, the unified multilingual model can not only produce both English and Chinese descriptions for a video more efficiently, but also offer improved performance over the monolingual models. Furthermore, we demonstrate that the spatiotemporal video context can be effectively utilized to align source and target languages and thus assist machine translation. In the end, we discuss the potentials of using \vatex for other video-and-language research.
TL;DR: An efficient text categorization algorithm that generates bigrams selectively by looking for ones that have an especially good chance of being useful by using the information gain metric, combined with various frequency thresholds is presented.
Abstract: In this paper, we present an efficient text categorization algorithm that generates bigrams selectively by looking for ones that have an especially good chance of being useful. The algorithm uses the information gain metric, combined with various frequency thresholds. The bigrams, along with unigrams, are then given as features to two different classifiers: Naive Bayes and maximum entropy. The experimental results suggest that the bigrams can substantially raise the quality of feature sets, showing increases in the break-even points and F1 measures. The McNemar test shows that in most categories the increases are very significant. Upon close examination of the algorithm, we concluded that the algorithm is most successful in correctly classifying more positive documents, but may cause more negative documents to be classified incorrectly.
TL;DR: A new updating scheme for low numerical rank matrices that can be shown to be numerically stable and fast is discussed and a comparison with a nonadaptive SVD scheme shows that this algorithm achieves similar accuracy levels for image reconstruction and recognition at a significantly lower computational cost.
Abstract: During the past few years several interesting applications of eigenspace representation of images have been proposed. These include face recognition, video coding, and pose estimation. However, the vision research community has largely overlooked parallel developments in signal processing and numerical linear algebra concerning efficient eigenspace updating algorithms. These new developments are significant for two reasons: Adopting them will make some of the current vision algorithms more robust and efficient. More important is the fact that incremental updating of eigenspace representations will open up new and interesting research applications in vision such as active recognition and learning. The main objective of this paper is to put these in perspective and discuss a new updating scheme for low numerical rank matrices that can be shown to be numerically stable and fast. A comparison with a nonadaptive SVD scheme shows that our algorithm achieves similar accuracy levels for image reconstruction and recognition at a significantly lower computational cost. We also illustrate applications to adaptive view selection for 3D object representation from projections.
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.
01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.
TL;DR: In this article, the authors proposed a shape model based on the Hamilton-Jacobi approach to shape modeling, which retains some of the attractive features of existing methods and overcomes some of their limitations.
Abstract: Shape modeling is an important constituent of computer vision as well as computer graphics research. Shape models aid the tasks of object representation and recognition. This paper presents a new approach to shape modeling which retains some of the attractive features of existing methods and overcomes some of their limitations. The authors' techniques can be applied to model arbitrarily complex shapes, which include shapes with significant protrusions, and to situations where no a priori assumption about the object's topology is made. A single instance of the authors' model, when presented with an image having more than one object of interest, has the ability to split freely to represent each object. This method is based on the ideas developed by Osher and Sethian (1988) to model propagating solid/liquid interfaces with curvature-dependent speeds. The interface (front) is a closed, nonintersecting, hypersurface flowing along its gradient field with constant speed or a speed that depends on the curvature. It is moved by solving a "Hamilton-Jacobi" type equation written for a function in which the interface is a particular level set. A speed term synthesized from the image is used to stop the interface in the vicinity of object boundaries. The resulting equation of motion is solved by employing entropy-satisfying upwind finite difference schemes. The authors present a variety of ways of computing the evolving front, including narrow bands, reinitializations, and different stopping criteria. The efficacy of the scheme is demonstrated with numerical experiments on some synthesized images and some low contrast medical images. >
01 Jan 2006
••01 May 2009
TL;DR: This paper breaks down the energy consumption for the components of a typical sensor node, and discusses the main directions to energy conservation in WSNs, and presents a systematic and comprehensive taxonomy of the energy conservation schemes.
Abstract: In the last years, wireless sensor networks (WSNs) have gained increasing attention from both the research community and actual users. As sensor nodes are generally battery-powered devices, the critical aspects to face concern how to reduce the energy consumption of nodes, so that the network lifetime can be extended to reasonable times. In this paper we first break down the energy consumption for the components of a typical sensor node, and discuss the main directions to energy conservation in WSNs. Then, we present a systematic and comprehensive taxonomy of the energy conservation schemes, which are subsequently discussed in depth. Special attention has been devoted to promising solutions which have not yet obtained a wide attention in the literature, such as techniques for energy efficient data acquisition. Finally we conclude the paper with insights for research directions about energy conservation in WSNs.