scispace - formally typeset
Search or ask a question

Showing papers by "Aggelos K. Katsaggelos published in 2015"


Journal ArticleDOI
13 Aug 2015
TL;DR: This review will address issues in AV fusion in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other.
Abstract: In this paper, we review recent results on audiovisual (AV) fusion. We also discuss some of the challenges and report on approaches to address them. One important issue in AV fusion is how the modalities interact and influence each other. This review will address this question in the context of AV speech processing, and especially speech recognition, where one of the issues is that the modalities both interact but also sometimes appear to desynchronize from each other. An additional issue that sometimes arises is that one of the modalities may be missing at test time, although it is available at training time; for example, it may be possible to collect AV training data while only having access to audio at test time. We will review approaches to address this issue from the area of multiview learning, where the goal is to learn a model or representation for each of the modalities separately while taking advantage of the rich multimodal training data. In addition to multiview learning, we also discuss the recent application of deep learning (DL) toward AV fusion. We finally draw conclusions and offer our assessment of the future in the area of AV fusion.

129 citations


Journal ArticleDOI
TL;DR: A prototype compressive video camera is presented that encodes scene movement using a translated binary photomask in the optical path, and the use of a printed binary mask allows reconstruction at higher spatial resolutions than has been previously demonstrated.
Abstract: We present a prototype compressive video camera that encodes scene movement using a translated binary photomask in the optical path. The encoded recording can then be used to reconstruct multiple output frames from each captured image, effectively synthesizing high speed video. The use of a printed binary mask allows reconstruction at higher spatial resolutions than has been previously demonstrated. In addition, we improve upon previous work by investigating tradeoffs in mask design and reconstruction algorithm selection. We identify a mask design that consistently provides the best performance across multiple reconstruction strategies in simulation, and verify it with our prototype hardware. Finally, we compare reconstruction algorithms and identify the best choice in terms of balancing reconstruction quality and speed.

74 citations


Journal ArticleDOI
TL;DR: This paper provides a review of the recent literature on Bayesian Blind Image Deconvolution methods and focuses on VB inference and the use of SG and SMG models with coverage of recent advances in sampling methods.

61 citations


Journal ArticleDOI
TL;DR: Qualitative discriminative features extracted from late gadolinium enhanced cardiac magnetic resonance images of post-MI patients are proposed, to distinguish between 20 high-risk and 34 low-risk patients, and show that textural features from the scar are important for classification and that localization features provide an additional benefit.

43 citations


Journal ArticleDOI
19 Nov 2015-eLife
TL;DR: It is suggested that dynamic heterogeneity of Yan is a necessary element of the transition process, and cell states are stabilized through noise reduction.
Abstract: Yan is an ETS-domain transcription factor responsible for maintaining Drosophila eye cells in a multipotent state. Yan is at the core of a regulatory network that determines the time and place in which cells transit from multipotency to one of several differentiated lineages. Using a fluorescent reporter for Yan expression, we observed a biphasic distribution of Yan in multipotent cells, with a rapid inductive phase and slow decay phase. Transitions to various differentiated states occurred over the course of this dynamic process, suggesting that Yan expression level does not strongly determine cell potential. Consistent with this conclusion, perturbing Yan expression by varying gene dosage had no effect on cell fate transitions. However, we observed that as cells transited to differentiation, Yan expression became highly heterogeneous and this heterogeneity was transient. Signals received via the EGF Receptor were necessary for the transience in Yan noise since genetic loss caused sustained noise. Since these signals are essential for eye cells to differentiate, we suggest that dynamic heterogeneity of Yan is a necessary element of the transition process, and cell states are stabilized through noise reduction.

40 citations


Journal ArticleDOI
TL;DR: This paper considers an underdetermined linear system with sparse solutions and proposes a preconditioning technique that yields a system matrix having the properties of an incoherent unit norm tight frame, based on recent theoretical results for standard numerical solvers such as BP and OMP.
Abstract: Performance guarantees for the algorithms deployed to solve underdetermined linear systems with sparse solutions are based on the assumption that the involved system matrix has the form of an incoherent unit norm tight frame. Learned dictionaries, which are popular in sparse representations, often do not meet the necessary conditions for signal recovery. In compressed sensing (CS), recovery rates have been improved substantially with optimized projections; however, these techniques do not produce binary matrices, which are more suitable for hardware implementation. In this paper, we consider an underdetermined linear system with sparse solutions and propose a preconditioning technique that yields a system matrix having the properties of an incoherent unit norm tight frame. While existing work in preconditioning concerns greedy algorithms, the proposed technique is based on recent theoretical results for standard numerical solvers such as BP and OMP. Our simulations show that the proposed preconditioning improves the recovery rates both in sparse representations and CS; the results for CS are comparable to optimized projections.

25 citations


Journal ArticleDOI
TL;DR: Making use of variational Dirichlet approximation, this paper provides a blur posterior approximation that considers the uncertainty of the estimate and removes noise in the estimated kernel and is very competitive to the state-of-the-art blind image restoration methods.
Abstract: Blind image deconvolution involves two key objectives: 1) latent image and 2) blur estimation For latent image estimation, we propose a fast deconvolution algorithm, which uses an image prior of nondimensional Gaussianity measure to enforce sparsity and an undetermined boundary condition methodology to reduce boundary artifacts For blur estimation, a linear inverse problem with normalization and nonnegative constraints must be solved However, the normalization constraint is ignored in many blind image deblurring methods, mainly because it makes the problem less tractable In this paper, we show that the normalization constraint can be very naturally incorporated into the estimation process by using a Dirichlet distribution to approximate the posterior distribution of the blur Making use of variational Dirichlet approximation, we provide a blur posterior approximation that considers the uncertainty of the estimate and removes noise in the estimated kernel Experiments with synthetic and real data demonstrate that the proposed method is very competitive to the state-of-the-art blind image restoration methods

24 citations


Proceedings ArticleDOI
10 Dec 2015
TL;DR: A multiple-frame super-resolution (SR) algorithm based on dictionary learning and motion estimation based on the use of multiple bilevel dictionaries that improves over single frame SR and a novel dictionary learning algorithm which is trained from consecutive video frames, rather than still images or individual video frames.
Abstract: In this paper, we propose a multiple-frame super-resolution (SR) algorithm based on dictionary learning and motion estimation. We adopt the use of multiple bilevel dictionaries which have also been used for single-frame SR. Multiple frames compensated through sub-pixel motion are considered. By simultaneously solving for a batch of patches from multiple frames, the proposed multiple-frame SR algorithm improves over single frame SR. We also propose a novel dictionary learning algorithm based on which dictionaries are trained from consecutive video frames, rather than still images or individual video frames, which further improves the performance of the developed video SR algorithm. Extensive experimental comparisons with state-of-the-art SR algorithms verifies the effectiveness of our proposed multiple-frame SR approach.

21 citations


Proceedings ArticleDOI
28 Dec 2015
TL;DR: A new real time MMW threat detection algorithm based on a tailored de-noising, body and threat segmentation, and threat detection process that outperforms currently existing detection procedures is presented.
Abstract: Millimeter Wave (MMW) imaging systems are currently being used to detect hidden threats. Unfortunately the current performance of detection algorithms is very poor due to the presence of severe noise, the low resolution of MMW images and, in general, the poor quality of the acquired images. In this paper we present a new real time MMW threat detection algorithm based on a tailored de-noising, body and threat segmentation, and threat detection process that outperforms currently existing detection procedures. A complete comparison with a state of art threat detection algorithm is presented in the experimental section.

12 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work introduces the first all-digital temporal compressive video camera that uses custom subsampling modes to achieve spatio-temporal multiplexing, and requires no additional optical components, enabling it to be implemented in a compact package such as a mobile camera module.
Abstract: The maximum achievable frame-rate for a video camera is limited by the sensor's pixel readout rate. The same sensor may achieve either a slow frame-rate at full resolution (e.g., 60 fps at 4 Mpixel resolution) or a fast frame-rate at low resolution (e.g., 240 fps at 1 Mpixel resolution). Higher frame-rates are achieved using pixel readout modes (e.g., subsampling or binning) that sacrifice spatial for temporal resolution within a fixed bandwidth. A number of compressive video cameras have been introduced to overcome this fixed bandwidth constraint and achieve high frame-rates without sacrificing spatial resolution. These methods use electro-optic components (e.g., LCoS, DLPs, piezo actuators) to introduce high speed spatio-temporal multiplexing in captured images. Full resolution, high speed video is then restored by solving an undetermined system of equations using a sparse regularization framework. In this work, we introduce the first all-digital temporal compressive video camera that uses custom subsampling modes to achieve spatio-temporal multiplexing. Unlike previous compressive video cameras, ours requires no additional optical components, enabling it to be implemented in a compact package such as a mobile camera module. We demonstrate results using a TrueSense development kit with a 12 Mpixel sensor and programmable FPGA read out circuitry.

12 citations


Journal ArticleDOI
TL;DR: By solving the regularized fitting problem, the problem of recovering jointly sparse vectors from underdetermined measurements that are corrupted by both additive noise and outliers is considered and a general approach is proposed that employs state-of-the-art technologies for signal recovery.
Abstract: In this paper, we consider the problem of recovering jointly sparse vectors from underdetermined measurements that are corrupted by both additive noise and outliers. This can be viewed as the robust extension of the Multiple Measurement Vector (MMV) problem. To solve this problem, we propose two general approaches. As a benchmark, the first approach preprocesses the input for outlier removal and then employs state-of-the-art technologies for signal recovery. The second approach, as the main contribution of this paper, is based on formulation of an innovative regularized fitting problem. By solving the regularized fitting problem, we jointly remove outliers and recover the sparse vectors. Furthermore, by exploiting temporal smoothness among the sparse vectors, we improve noise robustness of the proposed approach and avoid the problem of over-fitting. Extensive numerical results are provided to illustrate the excellent performance of the proposed approach.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: The proposed CS Super Resolution (SR) approach combines existing CS reconstruction algorithms with an LR to HR approach based on the use of a Super Gaussian (SG) regularization term, obtaining excellent SR reconstructions at ratios below one.
Abstract: In this work we propose a novel framework to obtain High Resolution (HR) images from Compressed Sensing (CS) imaging systems capturing multiple Low Resolution (LR) images of the same scene. The proposed CS Super Resolution (SR) approach combines existing CS reconstruction algorithms with an LR to HR approach based on the use of a Super Gaussian (SG) regularization term. The reconstruction is formulated as a constrained optimization problem which is solved using the Alternate Direction Methods of Multipliers (ADMM). The image estimation subproblem is solved using Majorization-Minimization (MM) while the CS reconstruction becomes an l 1 -minimization subject to a quadratic constraint. The performed experiments show that the proposed method compares favorably to classical SR methods at compression ratio 1, obtaining excellent SR reconstructions at ratios below one.

Patent
12 Jun 2015
TL;DR: Using a plurality of distinct behavioral tasks conducted in a functional magnetic resonance imaging (fMRI) scanner, fMRI data acquired from one or more subjects performing working memory tasks can be used for diagnosing psychiatrics and neurological disorders.
Abstract: Using a plurality of distinct behavioral tasks conducted in a functional magnetic resonance imaging (fMRI) scanner, fMRI data acquired from one or more subjects performing working memory tasks can be used for diagnosing psychiatrics and neurological disorders. A classification algorithm can be used to determine a classification model, tune the model, and apply the model. An output indicative of a subject's clinical condition can then be provided and used to diagnose new cases.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: A new Gaussian Process (GP) classification method for multisensory data which combines the information provided by all sensors and approximates the posterior distribution of the GP using variational Bayesian inference.
Abstract: In this paper, we introduce a new Gaussian Process (GP) classification method for multisensory data. The proposed approach can deal with noisy and missing data. It is also capable of estimating the contribution of each sensor towards the classification task. We use Bayesian modeling to build a GP-based classifier which combines the information provided by all sensors and approximates the posterior distribution of the GP using variational Bayesian inference. During its training phase, the algorithm estimates each sensor's weight and then uses this information to assign a label to each new sample. In the experimental section, we evaluate the classiication performance of the proposed method on both synthetic and real data and show its applicability to different scenarios.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: This paper proposes an optimization scheme in order to achieve adequate spatio-temporal sampling of subsequent frames under maximal capturing speed, based on the bandwidth constraints of a sensor, and test this strategy on a commercially available camera.
Abstract: In this paper, we consider the problem of on-chip temporal compressive sensing for video reconstruction at high frame-rates without the need of any additional optical components. We devise an optimization scheme in order to achieve adequate spatio-temporal sampling of subsequent frames under maximal capturing speed, based on the bandwidth constraints of a sensor. We test this optimization strategy on a commercially available camera and propose a set of reconstruction steps that can achieve reasonable performance but, at the same time, accommodate high-resolution video reconstruction under realistic time requirements. Our analysis constitutes a set of first steps bringing high-speed compressive video capture within the realm of commercial availability.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: This study promotes the use of photometric stereo to capitalize on the increasing popularity of Reflectance Transformation Imaging (RTI) among conservators in the world's leading museums.
Abstract: Starting in the 1890s the artist Paul Gauguin (1848-1903) created a series of prints and transfer drawings using techniques that are not entirely understood. To better understand the artist's production methods, photometric stereo was used to assess the surface shape of a number of these graphic works that are now in the collection of the Art Institute of Chicago. Photometric stereo uses multiple images of Gauguin's graphic works captured from a fixed camera position, lit from multiple specific angles to create an interactive composite image that reveals textural characteristics. These active images reveal details of sequential media application upon experimental printing matrices that help resolve longstanding art historical questions about the evolution of Gauguin's printing techniques. Our study promotes the use of photometric stereo to capitalize on the increasing popularity of Reflectance Transformation Imaging (RTI) among conservators in the world's leading museums.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: The proposed CS Super Resolution (CSS-R) approach combines existing CS reconstruction algorithms with the use of Super Gaussian regularization terms on the image to be reconstructed, smoothness constraints on the registration parameters to be estimated and the use the Alternate Direction Methods of Multipliers (ADMM) to link the CS and SR problems.
Abstract: In this paper we propose a novel optimization framework to obtain High Resolution (HR) Passive Millimeter Wave (P-MMW) images from multiple Low Resolution (LR) observations captured using a simulated Compressed Sensing (CS) imaging system. The proposed CS Super Resolution (CSS-R) approach combines existing CS reconstruction algorithms with the use of Super Gaussian (SG) regularization terms on the image to be reconstructed, smoothness constraints on the registration parameters to be estimated and the use of the Alternate Direction Methods of Multipliers (ADMM) to link the CS and SR problems. The image estimation subproblem is solved using Majorization-Minimization (MM), registration is tackled minimizing a quadratic function and CS reconstruction is approached as an l 1 -minimization problem subject to a quadratic constraint. The performed experiments, on simulated and real PMMW observations, validate the used approach.

Patent
27 Jan 2015
TL;DR: In this paper, a method for automated detection and classification of objects in a fluid of a receptacle such as a soft-sided receptacle is presented. But the method is not suitable for the detection of moving objects.
Abstract: Systems and methods for automated detection and classification of objects in a fluid of a receptacle such as, for example, a soft-sided receptacle such as a flexible container. The automated detection may include initiating movement of the receptacle to move objects in the fluid contained by the receptacle. Sequential frames of image data may be recorded and processed to identify moving objects in the image data. In turn, at least one motion parameter of the objects may be determined and utilized to classify the object into at least one of a predetermined plurality of object classes. For example, the object classes may at least include a predetermined class corresponding to bubbles and a predetermined class corresponding to particles.

01 Jan 2015
TL;DR: Variability of the PSF across the field of view in anisoplanatic imagery can be described using principal component analysis and a certain number of variable PSFs can be used to create new basis functions, called principal components (PC), which can be considered constant across the FOV and, therefore, potentially been used to perform global deconvolution.
Abstract: The performance of optical systems is highly degraded by atmospheric turbulence when observing both vertically (e.g., astronomy, remote sensing) or horizontally (e.g. long-range surveillance). This problem can be partially alleviated using adaptive optics (AO) but only for small fields of view (FOV), described by the isoplanatic angle, for which the turbulence-induced aberrations can be considered constant. Additionally, this problem can also be tackled using post-processing techniques such as deconvolution algorithms which take into account the variability of the point spread function (PSF) in anisoplanatic conditions. Variability of the PSF across the field of view in anisoplanatic imagery can be described using principal component analysis. Then, a certain number of variable PSFs can be used to create new basis functions, called principal components (PC), which can be considered constant across the FOV and, therefore, potentially be used to perform global deconvolution. Our approach is tested on simulated, single-conjugate AO data.

Journal ArticleDOI
TL;DR: This paper proposes a new, task-based metric where the performance of an SR algorithm is, instead, directly tied to the probability of successfully detecting critical spatial frequencies within the scene.
Abstract: The image processing technique known as superresolution (SR) has the potential to allow engineers to specify lower resolution and, therefore, less expensive cameras for a given task by enhancing the base camera's resolution. This is especially true in the remote detection and classification of objects in the environment, such as aircraft or human faces. Performing each of these tasks requires a minimum image "sharpness" which is quantified by a maximum resolvable spatial frequency, which is, in turn, a function of the camera optics, pixel sampling density, and signal-to-noise ratio. Much of the existing SR literature focuses on SR performance metrics for candidate algorithms, such as perceived image quality or peak SNR. These metrics can be misleading because they also credit deblurring and/or denoising in addition to true SR. In this paper, we propose a new, task-based metric where the performance of an SR algorithm is, instead, directly tied to the probability of successfully detecting critical spatial frequencies within the scene.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: An algorithm to evaluate the expected receiver distortion on the source side by utilizing encoder information, transmission channel characteristics and error concealment provides a more accurate estimate of the distortion that closely models quality as perceived through the human visual system.
Abstract: Efficient streaming of video over wireless networks requires real-time assessment of distortion due to packet loss, especially because predictive coding at the encoder can cause inter-frame propagation of errors and impact the overall quality of the transmitted video. This paper presents an algorithm to evaluate the expected receiver distortion on the source side by utilizing encoder information, transmission channel characteristics and error concealment. Specifically, distinct video transmission units, Group of Blocks (GOBs), are iteratively built at the source by taking into account macroblock coding modes and motion-compensated error concealment for three different combinations of packet loss. Distortion of these units is then calculated using the structural similarity (SSIM) metric and they are stochastically combined to derive the overall expected distortion. The proposed model provides a more accurate estimate of the distortion that closely models quality as perceived through the human visual system. When incorporated into a content-aware utility function, preliminary experimental results show improved packet ordering & scheduling efficiency and overall video signal at the receiver.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: In this article, the performance of non-focusing diffraction gratings was compared with ideal lenses and zone plates of similar structural properties. But the authors focused on the potential benefits of their use in computational imaging systems.
Abstract: In this work, we compare the performance of previously proposed ultra-miniature diffraction gratings with ideal lenses and zone plates of similar structural characteristics. The analysis aims at understanding the differences of designs utilizing non-focusing gratings and the potential benefits of their use in computational imaging systems.


01 Jan 2015
TL;DR: This work describes two major issues in object-based video coding and communications and provides solutions based on the MPEG-4 coding standard, and discusses the resource allocation problem in video communications and demonstrates a number of unequal error protection schemes.
Abstract: We describe two major issues in object-based video coding and communications and provide solutions based on the MPEG-4 coding standard. We first consider the general problem of bit allocation among shape, texture and motion in video coding. and provide optimal solutions based on MINMAX (minimum maximum) and MINAVE (minimum average) distortion criteria, respectively. Then, we discuss the resource allocation problem in video communications and demonstrate a number of unequal error protection schemes, including separated packetization, joint source-channel coding and data hiding. Experimental results demonstrate significant gains by using these algorithms.