scispace - formally typeset
Search or ask a question

Showing papers on "Sparse approximation published in 2003"


Journal ArticleDOI
TL;DR: This article obtains parallel results in a more general setting, where the dictionary D can arise from two or several bases, frames, or even less structured systems, and sketches three applications: separating linear features from planar ones in 3D data, noncooperative multiuser encoding, and identification of over-complete independent component models.
Abstract: Given a dictionary D = {dk} of vectors dk, we seek to represent a signal S as a linear combination S = ∑k γ(k)dk, with scalar coefficients γ(k). In particular, we aim for the sparsest representation possible. In general, this requires a combinatorial optimization process. Previous work considered the special case where D is an overcomplete system consisting of exactly two orthobases and has shown that, under a condition of mutual incoherence of the two bases, and assuming that S has a sufficiently sparse representation, this representation is unique and can be found by solving a convex optimization problem: specifically, minimizing the l1 norm of the coefficients γ. In this article, we obtain parallel results in a more general setting, where the dictionary D can arise from two or several bases, frames, or even less structured systems. We sketch three applications: separating linear features from planar ones in 3D data, noncooperative multiuser encoding, and identification of over-complete independent component models.

3,158 citations


Journal ArticleDOI
TL;DR: It is proved that the result of Donoho and Huo, concerning the replacement of the /spl lscr//sup 0/ optimization problem with a linear programming problem when searching for sparse representations has an analog for dictionaries that may be highly redundant.
Abstract: The purpose of this correspondence is to generalize a result by Donoho and Huo and Elad and Bruckstein on sparse representations of signals in a union of two orthonormal bases for R/sup N/ We consider general (redundant) dictionaries for R/sup N/, and derive sufficient conditions for having unique sparse representations of signals in such dictionaries The special case where the dictionary is given by the union of L/spl ges/2 orthonormal bases for R/sup N/ is studied in more detail In particular, it is proved that the result of Donoho and Huo, concerning the replacement of the /spl lscr//sup 0/ optimization problem with a linear programming problem when searching for sparse representations, has an analog for dictionaries that may be highly redundant

1,049 citations


Journal ArticleDOI
TL;DR: Algorithms for data-driven learning of domain-specific overcomplete dictionaries are developed to obtain maximum likelihood and maximum a posteriori dictionary estimates based on the use of Bayesian models with concave/Schur-concave negative log priors, showing improved performance over other independent component analysis methods.
Abstract: Algorithms for data-driven learning of domain-specific overcomplete dictionaries are developed to obtain maximum likelihood and maximum a posteriori dictionary estimates based on the use of Bayesian models with concave/Schur-concave (CSC) negative log priors. Such priors are appropriate for obtaining sparse representations of environmental signals within an appropriately chosen (environmentally matched) dictionary. The elements of the dictionary can be interpreted as concepts, features, or words capable of succinct expression of events encountered in the environment (the source of the measured signals). This is a generalization of vector quantization in that one is interested in a description involving a few dictionary entries (the proverbial "25 words or less"), but not necessarily as succinct as one entry. To learn an environmentally adapted dictionary capable of concise expression of signals generated by the environment, we develop algorithms that iterate between a representative set of sparse representations found by variants of FOCUSS and an update of the dictionary using these sparse representations.Experiments were performed using synthetic data and natural images. For complete dictionaries, we demonstrate that our algorithms have improved performance over other independent component analysis (ICA) methods, measured in terms of signal-to-noise ratios of separated sources. In the overcomplete case, we show that the true underlying dictionary and sparse sources can be accurately recovered. In tests with natural images, learned overcomplete dictionaries are shown to have higher coding efficiency than complete dictionaries; that is, images encoded with an overcomplete dictionary have both higher compression (fewer bits per pixel) and higher accuracy (lower mean square error).

892 citations


Proceedings Article
01 Jan 2003
TL;DR: A method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection, which leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically.
Abstract: We present a method for the sparse greedy approximation of Bayesian Gaussian process regression, featuring a novel heuristic for very fast forward selection Our method is essentially as fast as an equivalent one which selects the "support" patterns at random, yet it can outperform random selection on hard curve fitting tasks More importantly, it leads to a sufficiently stable approximation of the log marginal likelihood of the training data, which can be optimised to adjust a large number of hyperparameters automatically We demonstrate the model selection capabilities of the algorithm in a range of experiments In line with the development of our method, we present a simple view on sparse approximations for GP models and their underlying assumptions and show relations to other methods

487 citations


Dissertation
01 Jul 2003
TL;DR: The tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning is demonstrated and generic schemes for automatic model selection with many (hyper)parameters are developed.
Abstract: Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here.

202 citations


Journal ArticleDOI
Heiko Wersing1, Edgar Körner1
TL;DR: This work proposes a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network.
Abstract: There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

198 citations


Journal ArticleDOI
TL;DR: It is established here that the EB condition is both sufficient and necessary for replacing an l/sub 0/ optimization by linear programming minimization when searching for the unique sparse representation.
Abstract: In previous work, Elad and Bruckstein (EB) have provided a sufficient condition for replacing an l/sub 0/ optimization by linear programming minimization when searching for the unique sparse representation. We establish here that the EB condition is both sufficient and necessary.

189 citations


Journal ArticleDOI
TL;DR: It is observed in first tests that these general adaptive sparse grids allow the identification of the ANOVA structure and thus provide comprehensible models, very important for data mining applications.
Abstract: Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in the solution of partial differential equations, integral equations and classification problems. Adaptive sparse grid functions are elements of a function space lattice. Such lattices allow the generalisation of sparse grid techniques to the fitting of very high-dimensional functions with categorical and continuous variables. We have observed in first tests that these general adaptive sparse grids allow the identification of the ANOVA structure and thus provide comprehensible models. This is very important for data mining applications. Perhaps the main advantage of these models is that they do not include any spurious interaction terms and thus can deal with very high dimensional data.

133 citations


Proceedings ArticleDOI
06 Apr 2003
TL;DR: This method generalizes Wiener filtering with locally stationary, non-Gaussian, parametric source models and uses a sparse non negative decomposition algorithm of its own to perform the separation of two sound sources from a single sensor.
Abstract: We propose a new method to perform the separation of two sound sources from a single sensor. This method generalizes Wiener filtering with locally stationary, non-Gaussian, parametric source models. The method involves a learning phase for which we propose three different algorithm. In the separation phase, we use a sparse non negative decomposition algorithm of our own. The algorithms are evaluated on the separation of real audio data.

116 citations


01 Jan 2003
TL;DR: A sparse decomposition approach of observed data matrix is presented and the approach is then used in blind source separation with less sensors than sources, which is implemented in time-frequency domain after applying wavelet packet transformation preprocessing to the observed mixtures.
Abstract: A sparse decomposition approach of observed data matrix is presented in this paper and the approach is then used in blind source separation with less sensors than sources. First, sparse representation (factorization) of a data matrix is discussed. For a given basis matrix, there exist infinite coefficient matrices (solutions) generally such that the data matrix can be represented by the product of the basis matrix and coefficient matrices. However, the sparse solution with minimum1-norm is unique with probability one, and can be obtained by using linear programming algorithm. The basis matrix can be estimated using gradient type algorithm or Kmeans clustering algorithm. Next, blind source separation is discussed based on sparse factorization approach. The blind separation technique includes two steps, one is to estimate a mixing matrix (basis matrix in the sparse representation), the second is to estimate sources (coefficient matrix). If the sources are sufficiently sparse, blind separation can be carried out directly in the time domain. Otherwise, blind separation can be implemented in time-frequency domain after applying wavelet packet transformation preprocessing to the observed mixtures. Three simulation examples are presented to illustrate the proposed algorithms and reveal algorithms performance. Finally, concluding remarks review the developed approach and state the open problems for further studying.

97 citations


Journal ArticleDOI
TL;DR: The sparse grid approach, based upon a direct higher order discretization on the sparse grid, overcomes this dilemma to some extent, and introduces additional flexibility with respect to both the order of the 1 D quadrature rule applied (in the sense of Smolyak's tensor product decomposition) and the placement of grid points.
Abstract: In this paper, we study the potential of adaptive sparse grids for multivariate numerical quadrature in the moderate or high dimensional case, i.e. for a number of dimensions beyond three and up to several hundreds. There, conventional methods typically suffer from the curse of dimension or are unsatisfactory with respect to accuracy. Our sparse grid approach, based upon a direct higher order discretization on the sparse grid, overcomes this dilemma to some extent, and introduces additional flexibility with respect to both the order of the 1 D quadrature rule applied (in the sense of Smolyak's tensor product decomposition) and the placement of grid points. The presented algorithm is applied to some test problems and compared with other existing methods.

Proceedings ArticleDOI
24 Nov 2003
TL;DR: A new greedy algorithm for solving the sparse approximation problem over quasiincoherent dictionaries that provides strong guarantees on the quality of the approximations it produces, unlike most other methods for sparse approximation.
Abstract: This paper discusses a new greedy algorithm for solving the sparse approximation problem over quasiincoherent dictionaries. These dictionaries consist of waveforms that are uncorrelated "on average," and they provide a natural generalization of incoherent dictionaries. The algorithm provides strong guarantees on the quality of the approximations it produces, unlike most other methods for sparse approximation. Moreover, very efficient implementations are possible via approximate nearest-neighbor data structures.

Journal ArticleDOI
TL;DR: RSparseM provides some basic R functionality for linear algebra with sparse matrices and a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices.
Abstract: SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilization and computational speed are possible for applications involving large sparse matrices.

Book ChapterDOI
02 Jun 2003
TL;DR: This paper discusses the multiresolution formulation of quantum chemistry including application to density functional theory and developments that make practical computation in three and higher dimensions.
Abstract: Multiresolution analysis in multiwavelet bases is being investigated as an alternative computational framework for molecular electronic structure calculations. The features that make it attractive include an orthonormal basis, fast algorithms with guaranteed precision and sparse representations of many operators (e.g., Green functions). In this paper, we discuss the multiresolution formulation of quantum chemistry including application to density functional theory and developments that make practical computation in three and higher dimensions.

Journal ArticleDOI
TL;DR: Take advantage of the properties of multiscale transforms, such as wavelet packets, to decompose signals into sets of local features with various degrees of sparsity, and study how the separation error is affected by the sparsity of decomposition coefficients.
Abstract: We consider the problem of blind separation of unknown source signals or images from a given set of their linear mixtures. It was discovered recently that exploiting the sparsity of sources and their mixtures, once they are projected onto a proper space of sparse representation, improves the quality of separation. In this study we take advantage of the properties of multiscale transforms, such as wavelet packets, to decompose signals into sets of local features with various degrees of sparsity. We then study how the separation error is affected by the sparsity of decomposition coefficients, and by the misfit between the probabilistic model of these coefficients and their actual distribution. Our error estimator, based on the Taylor expansion of the quasi-ML function, is used in selection of the best subsets of coefficients and utilized, in turn, in further separation. The performance of the algorithm is evaluated by using noise-free and noisy data. Experiments with simulated signals, musical sounds and images, demonstrate significant improvement of separation quality over previously reported results.

Patent
17 Dec 2003
TL;DR: In this paper, a technique for determining when documents stored in digital format in a data processing system are similar is presented, where a method compares a sparse representation of two or more documents by breaking the documents into "chunks" of data of predefined sizes.
Abstract: A technique for determining when documents stored in digital format in a data processing system are similar. A method compares a sparse representation of two or more documents by breaking the documents into “chunks” of data of predefined sizes. Selected subsets of the chunks are determined as being representative of data in the documents and coefficients are developed to represent such chunks. Coefficients are then combined into coefficient clusters containing coefficients that are similar according to a predetermined similarity metric. The degree of similarity between documents is then evaluated by counting clusters into which chunks of similar documents fall.

Journal ArticleDOI
TL;DR: Certain sparse signal reconstruction problems have been shown to have unique solutions when the signal is known to have an exact sparse representation, and uniqueness is found to be extremely unstable for a number of common dictionaries.
Abstract: Certain sparse signal reconstruction problems have been shown to have unique solutions when the signal is known to have an exact sparse representation. This result is extended to provide bounds on the reconstruction error when the signal has been corrupted by noise or is not exactly sparse for some other reason. Uniqueness is found to be extremely unstable for a number of common dictionaries.

Journal ArticleDOI
TL;DR: In calculations on linear alkanes, polyglycines, estane polymers, and water clusters the optimal block size is found to be between 40 and 100 basis functions, where about 55–75% of the machine peak performance was achieved on an IBM RS6000 workstation.
Abstract: A sparse matrix multiplication scheme with multiatom blocks is reported, a tool that can be very useful for developing linear-scaling methods with atom-centered basis functions. Compared to conventional element-by-element sparse matrix multiplication schemes, efficiency is gained by the use of the highly optimized basic linear algebra subroutines (BLAS). However, some sparsity is lost in the multiatom blocking scheme because these matrix blocks will in general contain negligible elements. As a result, an optimal block size that minimizes the CPU time by balancing these two effects is recovered. In calculations on linear alkanes, polyglycines, estane polymers, and water clusters the optimal block size is found to be between 40 and 100 basis functions, where about 55-75% of the machine peak performance was achieved on an IBM RS6000 workstation. In these calculations, the blocked sparse matrix multiplications can be 10 times faster than a standard element-by-element sparse matrix package.

Proceedings ArticleDOI
13 Nov 2003
TL;DR: In this paper, a combination of the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme is proposed for separating images into texture and piecewise smooth parts.
Abstract: This paper presents a novel method for separating images into texture and piecewise smooth parts. The proposed approach is based on a combination of the Basis Pursuit Denoising (BPDN) algorithm and the Total-Variation (TV) regularization scheme. The basic idea promoted in this paper is the use of two appropriate dictionaries, one for the representation of textures, and the other for the natural scene parts. Each dictionary is designed for sparse representation of a particular type of image-content (either texture or piecewise smooth). The use of BPDN with the two augmented dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose a proper dictionary for natural scene is very hard, a TV regularization is employed to better direct the separation process. Experimental results validate the algorithm's performance.

Patent
16 Jul 2003
TL;DR: An efficient method for solving a model predictive control problem is described in this paper, in which a large sparse matrix equation is formed based upon the model predictive controller problem, and the square root of H, Hr is then formed directly, without first forming H.
Abstract: An efficient method for solving a model predictive control problem is described. A large sparse matrix equation is formed based upon the model predictive control problem. The square root of H, Hr, is then formed directly, without first forming H. A square root (LSMroot) of a large sparse matrix of the large sparse matrix equation is then formed using Hr in each of a plurality of iterations of a quadratic programming solver, without first forming the large sparse matrix and without recalculating Hr in each of the plurality of iterations. The solution of the large sparse matrix equation is completed based upon LSMroot.

Proceedings ArticleDOI
14 Nov 2003
TL;DR: A new framework, called piecewise linear separation, for blind source separation of possibly degenerate mixtures, including the extreme case of a single mixture of several sources is proposed.
Abstract: We propose a new framework, called piecewise linear separation, for blind source separation of possibly degenerate mixtures, including the extreme case of a single mixture of several sources. Its basic principle is to: 1/ decompose the observations into "components" using some sparse decomposition/nonlinear approximation technique; 2/ perform separation on each component using a "local" separation matrix. It covers many recently proposed techniques for degenerate BSS, as well as several new algorithms that we propose. We discuss two particular methods of multichannel decompositions based on the Best Basis and Matching Pursuit algorithms, as well as several methods to compute the local separation matrices (assuming the mixing matrix is known). Numerical experiments are used to compare the performance of various combinations of the decomposition and local separation methods. On the dataset used for the experiments, it seems that BB with either cosine packets of wavelet packets (Beylkin, Vaidyanathan, Battle3 or Battle 5 filter) are the best choices in terms of overall performance because they introduce a relatively low level of artefacts in the estimation of the sources; MP introduces slightly more artefacts, but can improve the rejection of the unwanted sources.

Journal ArticleDOI
TL;DR: Modifications are introduced and test of this method for computing factorized sparse approximate inverses for solving linear equations with the cg-method for symmetric positive definite A and given a priori pattern.
Abstract: In recent papers the use of sparse approximate inverses for the preconditioning of linear equations Axeb is examined. The minimization of ||AM−I|| in the Frobenius norm generates good preconditioners without any a priori knowledge on the pattern of M. For symmetric positive definite A and a given a priori pattern there exist methods for computing factorized sparse approximate inverses L with LLTaA−;1. Here, we want to modify these algorithms that they are able to capture automatically a promising pattern for L. We use these approximate inverses for solving linear equations with the cg-method. Furthermore we introduce and test modifications of this method for computing factorized sparse approximate inverses.

Proceedings Article
01 Jan 2003
TL;DR: In this article, a source localization method based on sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold is proposed, which enforce sparsity by imposing an /spl lscr/sub 1/norm penalty; this can be viewed as an estimation problem with a Laplacian prior.
Abstract: We present a source localization method based upon a sparse representation of sensor measurements with an overcomplete basis composed of samples from the array manifold. We enforce sparsity by imposing an /spl lscr//sub 1/-norm penalty; this can also be viewed as an estimation problem with a Laplacian prior. Explicitly enforcing the sparsity of the representation is motivated by a desire to obtain a sharp estimate of the spatial spectrum which exhibits superresolution. To summarize multiple time samples we use the singular value decomposition (SVD) of the data matrix. Our formulation leads to an optimization problem, which we solve efficiently in a second-order cone (SOC) programming framework by an interior point implementation. We demonstrate the effectiveness of the method on simulated data by plots of spatial spectra and by comparing the estimator variance to the Cramer-Rao bound (CRB). We observe that our approach has advantages over other source localization techniques including increased resolution; improved robustness to noise, limitations in data quantity, and correlation of the sources; as well as not requiring an accurate initialization.

Journal ArticleDOI
TL;DR: A cost comparison between the present sparse MGCG algorithm and a Cholesky factorization based algorithm that uses a reordering scheme to preserve sparsity indicates that the latter method is still competitive for real-time ExAO wave-front reconstruction for systems with up to N approximately equal to 10(4) degrees of freedom.
Abstract: A scalable sparse minimum-variance open-loop wave-front reconstructor for extreme adaptive optics (ExAO) systems is presented. The reconstructor is based on Ellerbroek’s sparse approximation of the wave-front inverse covariance matrix [J. Opt. Soc. Am. A 19, 1803 (2002)]. The baseline of the numerical approach is an iterative conjugate gradient (CG) algorithm for reconstructing a spatially sampled wave front at N grid points on a computational domain of size equal to the telescope’s primary mirror’s diameter D that uses a multigrid (MG) accelerator to speed up convergence efficiently and enhance its robustness. The combined MGCG scheme is order N and requires only two CG iterations to converge to the asymptotic average Strehl ratio (SR) and root-mean-square reconstruction error. The SR and reconstruction squared error are within standard deviation with figures obtained from a previously proposed MGCG fast-Fourier-transform based minimum-variance reconstructor that incorporates the exact wave-front inverse covariance matrix on a computational domain of size equal to 2D . A cost comparison between the present sparse MGCG algorithm and a Cholesky factorization based algorithm that uses a reordering scheme to preserve sparsity indicates that the latter method is still competitive for real-time ExAO wave-front reconstruction for systems with up to N ∼ 104 degrees of freedom because the update rate of the Cholesky factor is typically several orders of magnitude lower than the temporal sampling rate.

01 Jan 2003
TL;DR: In this paper, the author discusses several problems that arise as a result of the following conundrum: children receive massive amounts of language input and are extraordinarily conservative in their productions.
Abstract: Over the past decade, there has been an increasing awareness of the extent to which the speaker/hearer's language knowledge reflects the fine details of personal experience. This awareness can be seen in theoretical linguistics (e. approaches that rely on large-scale electronic corpora). In a curious way this reflects a pendulum swing back to older views that predate the generative era. But it is a pendulum swing with a difference. The emphasis over the past 50 years on the abstract nature of linguistic knowledge, and on ways in which linguistic generalizations often crucially refer to that abstract structure, has upped the ante for usage-based approaches. There is now a richer set of data and a more sophisticated awareness of the kinds of phenomena that want explaining. I am a firm believer in usage-based approaches. But I believe that usage has its place. We do more than simply record our past experiences, say, in some table of frequencies or probabilities. Rather, our experience forms the basis for generalization and abstraction. So induction is the name of the game. But it is also important to recognize, that induction is not unbridled or unconstrained. Indeed, decades of work in machine learning makes abundantly clear that there is no such thing as a general purpose learning algorithm that works equally well across domains. Induction may be the name of the game, but constraints are the rules that we play by. And, enthusiast that I am, I would also be the first to acknowledge that we are only now scratching the surface in developing our understanding of how induction in language learning works. In this paper, I would like to discuss several problems that arise as a result of the following conundrum. On the one hand, two things are clear. First, children receive massive amounts of language input. This point is made abundantly clear by the research of Huttenlocher and her colleagues (e. others. By some estimates, children may hear as many as 30 million words by age 3 (Hart & Risely, 1995). Second, children are extraordinarily conservative in their productions. They rarely venture very far beyond what they have heard others say. (This can give a misleading impression of linguistic precocity, when in fact children are simply skilled mimics.) This is not a new observation. among others, made this point this many years ago, but it seems only recently—perhaps in part as a result of …

Proceedings ArticleDOI
01 Jan 2003
TL;DR: A methodology for estimation in kernel-induced feature spaces is presented, making a link between the primal-dual formulation of least squares support vector machines (LS-SVM) and classical statistical inference techniques in order to perform linear regression in primal space.
Abstract: In this paper a methodology for estimation in kernel-induced feature spaces is presented, making a link between the primal-dual formulation of least squares support vector machines (LS-SVM) and classical statistical inference techniques in order to perform linear regression in primal space. This is done by computing a finite dimensional approximation of the kernel-induced feature space mapping by using the Nystrom technique in primal space. Additionally, the methodology can be applied for a fixed-size formulation using active selection of the support vectors with entropy maximization in order to obtain a sparse approximation. Examples for different cases show good results.

Proceedings ArticleDOI
13 Nov 2003
TL;DR: This paper proposes an approach based on building a sparse representation of images in a redundant geometrically inspired library of functions, followed by suitable coding techniques, which uses a greedy strategy and an enhancement layer that encodes the residual image.
Abstract: Very low bit rate image coding is an important problem regarding applications such as storage on low memory devices or streaming data on the internet. The state of the art in image compression is to use 2-D wavelets. The advantages of wavelet bases lie in their multiscale nature and in their ability to sparsely represent functions that are piecewise smooth. Their main problem on the other hand, is that in 2-D wavelets are not able to deal with the natural geometry of images, i.e they cannot sparsely represent objects that are smooth away from regular submanifolds. In this paper we propose an approach based on building a sparse representation of images in a redundant geometrically inspired library of functions, followed by suitable coding techniques. Best N-term non- linear approximations in general dictionaries is, in most cases, a NP-hard problem and sub-optimal approaches have to be followed. In this work we use a greedy strategy, also known as Matching Pursuit to compute the expansion. Finally the last step in our algorithm is an enhancement layer that encodes the residual image: in our simulation we have used a genuine embedded wavelet codec.

Journal ArticleDOI
TL;DR: A Bayesian classification scheme is applied to the problem of object recognition through probabilistic modeling of local color histograms and a local independent component analysis (ICA) representation of the data is proposed.

Proceedings Article
John Platt1
01 Jan 2003
TL;DR: This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities.
Abstract: This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists. MDS on very large sparse graphs can be effectively performed by a family of algorithms called Rectangular Dijsktra (RD) MDS algorithms. These RD algorithms operate on a dense rectangular slice of the distance matrix, created by calling Dijsktra a constant number of times. Two RD algorithms are compared: Landmark MDS, which uses the Nystrom approximation to perform MDS; and a new algorithm called Fast Sparse Embedding, which uses FastMap. These algorithms compare favorably to Laplacian Eigenmaps, both in terms of speed and embedding quality.

Proceedings Article
09 Dec 2003
TL;DR: It is proved that for the estimated overcom-plete basis matrix, the sparse solution (coefficient matrix) with minimum l1-norm is unique with probability of one, which can be obtained using a linear programming algorithm.
Abstract: In this paper, sparse representation (factorization) of a data matrix is first discussed. An overcomplete basis matrix is estimated by using the K-means method. We have proved that for the estimated overcom-plete basis matrix, the sparse solution (coefficient matrix) with minimum l1-norm is unique with probability of one, which can be obtained using a linear programming algorithm. The comparisons of the l1-norm solution and the l0-norm solution are also presented, which can be used in recoverability analysis of blind source separation (BSS). Next, we apply the sparse matrix factorization approach to BSS in the overcomplete case. Generally, if the sources are not sufficiently sparse, we perform blind separation in the time-frequency domain after preprocessing the observed data using the wavelet packets transformation. Third, an EEG experimental data analysis example is presented to illustrate the usefulness of the proposed approach and demonstrate its performance. Two almost independent components obtained by the sparse representation method are selected for phase synchronization analysis, and their periods of significant phase synchronization are found which are related to tasks. Finally, concluding remarks review the approach and state areas that require further study.