scispace - formally typeset
Search or ask a question
Author

Shannon M. Hughes

Bio: Shannon M. Hughes is an academic researcher from University of Colorado Boulder. The author has contributed to research in topics: Stylometry & Inpainting. The author has an hindex of 14, co-authored 24 publications receiving 696 citations. Previous affiliations of Shannon M. Hughes include Rice University & Princeton University.

Papers
More filters
Journal ArticleDOI
TL;DR: The approaches to brushwork analysis and artist identification developed by three research groups are described within the framework of this data set of 101 high-resolution gray-scale scans of paintings within the Van Gogh and Kroller-Muller museums.
Abstract: A survey of the literature reveals that image processing tools aimed at supplementing the art historian's toolbox are currently in the earliest stages of development. To jump-start the development of such methods, the Van Gogh and Kroller-Muller museums in The Netherlands agreed to make a data set of 101 high-resolution gray-scale scans of paintings within their collections available to groups of image processing researchers from several different universities. This article describes the approaches to brushwork analysis and artist identification developed by three research groups, within the framework of this data set.

300 citations

Proceedings ArticleDOI
01 Sep 2012
TL;DR: Under certain conditions, normal principal component analysis on such low-dimensional random projections of data actually returns the same result as PCA on the original data set would, even if the dimension of each random subspace used is very low.
Abstract: Algorithms that can efficiently recover principal components of high-dimensional data from compressive sensing measurements (e.g. low-dimensional random projections) of it have been an important topic of recent interest in the literature. In this paper, we show that, under certain conditions, normal principal component analysis (PCA) on such low-dimensional random projections of data actually returns the same result as PCA on the original data set would. In particular, as the number of data samples increases, the center of the randomly projected data converges to the true center of the original data (up to a known scaling factor) and the principal components converge to the true principal components of the original data as well, even if the dimension of each random subspace used is very low. Indeed, experimental results verify that this approach does estimate the original center and principal components very well for both synthetic and real-world datasets, including hyperspectral data. Its performance is even superior to that of other algorithms recently developed in the literature for this purpose.

71 citations

Proceedings ArticleDOI
07 Nov 2009
TL;DR: It is demonstrated that supervised machine learning on features derived from hidden-Markov-tree-modeling of the paintings' wavelet coefficients has the potential to distinguish copies from originals in the new dataset.
Abstract: This paper examines whether machine learning and image analysis tools can be used to assist art experts in the authentication of unknown or disputed paintings. Recent work on this topic [1] has presented some promising initial results. Our reexamination of some of these recently successful experiments shows that variations in image clarity in the experimental datasets were correlated with authenticity, and may have acted as a confounding factor, artificially improving the results. To determine the extent of this factor's influence on previous results, we provide a new “ground truth” data set in which originals and copies are known and image acquisition conditions are uniform. Multiple previously-successful methods are found ineffective on this new confounding-factor-free dataset, but we demonstrate that supervised machine learning on features derived from Hidden-Markov-Tree-modeling of the paintings' wavelet coefficients has the potential to distinguish copies from originals in the new dataset.

65 citations

Proceedings Article
21 Jun 2014
TL;DR: This paper proposes an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries that is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis.
Abstract: Algorithms that can efficiently recover principal components in very high-dimensional, streaming, and/or distributed data settings have become an important topic in the literature. In this paper, we propose an approach to principal component estimation that utilizes projections onto very sparse random vectors with Bernoulli-generated nonzero entries. Indeed, our approach is simultaneously efficient in memory/storage space, efficient in computation, and produces accurate PC estimates, while also allowing for rigorous theoretical performance analysis. Moreover, one can tune the sparsity of the random vectors deliberately to achieve a desired point on the tradeoffs between memory, computation, and accuracy. We rigorously characterize these tradeoffs and provide statistical performance guarantees. In addition to these very sparse random vectors, our analysis also applies to more general random projections. We present experimental results demonstrating that this approach allows for simultaneously achieving a substantial reduction of the computational complexity and memory/storage space, with little loss in accuracy, particularly for very high-dimensional data.

59 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper shows how to accurately recover nonlinearly K-sparse signals from approximately 2K measurements, which is often far lower than the number of measurements usually required under the assumption of sparsity in an orthonormal basis (e.g. wavelets).
Abstract: Compressive sensing accurately reconstructs a signal that is sparse in some basis from measurements, generally consisting of the signal's inner products with Gaussian random vectors. The number of measurements needed is based on the sparsity of the signal, allowing for signal recovery from far fewer measurements than is required by the traditional Shannon sampling theorem. In this paper, we show how to apply the kernel trick, popular in machine learning, to adapt compressive sensing to a different type of sparsity. We consider a signal to be “nonlinearly K-sparse” if the signal can be recovered as a nonlinear function of K underlying parameters. Images that lie along a low-dimensional manifold are good examples of this type of nonlinear sparsity. It has been shown that natural images are as well [1]. We show how to accurately recover these nonlinearly K-sparse signals from approximately 2K measurements, which is often far lower than the number of measurements usually required under the assumption of sparsity in an orthonormal basis (e.g. wavelets). In experimental results, we find that we can recover images far better for small numbers of compressive sensing measurements, sometimes reducing the mean square error (MSE) of the recovered image by an order of magnitude or more, with little computation. A bound on the error of our recovered signal is also proved.

41 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

3,433 citations

Journal ArticleDOI
TL;DR: This paper presents a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and presents an efficient algorithm for solving the corresponding optimization problem.
Abstract: Modeling data with linear combinations of a few elements from a learned dictionary has been the focus of much recent research in machine learning, neuroscience, and signal processing. For signals such as natural images that admit such sparse representations, it is now well established that these models are well suited to restoration tasks. In this context, learning the dictionary amounts to solving a large-scale matrix factorization problem, which can be done efficiently with classical optimization tools. The same approach has also been used for learning features from data for other purposes, e.g., image classification, but tuning the dictionary in a supervised way for these tasks has proven to be more difficult. In this paper, we present a general formulation for supervised dictionary learning adapted to a wide variety of tasks, and present an efficient algorithm for solving the corresponding optimization problem. Experiments on handwritten digit classification, digital art identification, nonlinear inverse image problems, and compressed sensing demonstrate that our approach is effective in large-scale settings, and is well suited to supervised and semi-supervised classification, as well as regression tasks for data that admit sparse representations.

919 citations

01 Jan 2004
TL;DR: A new algorithm for manifold learning and nonlinear dimensionality reduction is presented based on a set of unorganized da-ta points sampled with noise from a parameterized manifold, and the local geometry of the manifold is learned by constructing an approxi-mation for the tangent space at each point.
Abstract: We present a new algorithm for manifold learning and nonlinear dimensionality reduction. Based on a set of unorganized da-ta points sampled with noise from a parameterized manifold, the local geometry of the manifold is learned by constructing an approxi-mation for the tangent space at each point, and those tangent spaces are then aligned to give the global coordinates of the data pointswith respect to the underlying manifold. We also present an error analysis of our algorithm showing that reconstruction errors can bequite small in some cases. We illustrate our algorithm using curves and surfaces both in 2D/3D Euclidean spaces and higher dimension-al Euclidean spaces. We also address several theoretical and algorithmic issues for further research and improvements.

670 citations

Book
28 Feb 2019
TL;DR: In this paper, the authors bring together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science, and highlight many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy.
Abstract: Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This textbook brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Aimed at advanced undergraduate and beginning graduate students in the engineering and physical sciences, the text presents a range of topics and methods from introductory to state of the art.

563 citations

Journal ArticleDOI
TL;DR: This tutorial defines and discusses key aspects of the problem of computational inference of aesthetics and emotion from images and describes data sets available for performing assessment and outline several real-world applications where research in this domain can be employed.
Abstract: In this tutorial, we define and discuss key aspects of the problem of computational inference of aesthetics and emotion from images. We begin with a background discussion on philosophy, photography, paintings, visual arts, and psychology. This is followed by introduction of a set of key computational problems that the research community has been striving to solve and the computational framework required for solving them. We also describe data sets available for performing assessment and outline several real-world applications where research in this domain can be employed. A significant number of papers that have attempted to solve problems in aesthetics and emotion inference are surveyed in this tutorial. We also discuss future directions that researchers can pursue and make a strong case for seriously attempting to solve problems in this research domain.

361 citations