scispace - formally typeset
Search or ask a question

Showing papers by "Aggelos K. Katsaggelos published in 2019"


Journal ArticleDOI
TL;DR: This paper introduces a new generator network optimized for the VSR problem, named VSRResNet, along with new discriminator architecture to properly guide V SRResNet during the GAN training, and introduces the PercepDist metric, which more accurately evaluates the perceptual quality of SR solutions obtained from neural networks, compared with the commonly used PSNR/SSIM metrics.
Abstract: Video super-resolution (VSR) has become one of the most critical problems in video processing. In the deep learning literature, recent works have shown the benefits of using adversarial-based and perceptual losses to improve the performance on various image restoration tasks; however, these have yet to be applied for video super-resolution. In this paper, we propose a generative adversarial network (GAN)-based formulation for VSR. We introduce a new generator network optimized for the VSR problem, named VSRResNet, along with new discriminator architecture to properly guide VSRResNet during the GAN training. We further enhance our VSR GAN formulation with two regularizers, a distance loss in feature-space and pixel-space, to obtain our final VSRResFeatGAN model. We show that pre-training our generator with the mean-squared-error loss only quantitatively surpasses the current state-of-the-art VSR models. Finally, we employ the PercepDist metric to compare the state-of-the-art VSR models. We show that this metric more accurately evaluates the perceptual quality of SR solutions obtained from neural networks, compared with the commonly used PSNR/SSIM metrics. Finally, we show that our proposed model, the VSRResFeatGAN model, outperforms the current state-of-the-art SR models, both quantitatively and qualitatively.

112 citations


Journal ArticleDOI
TL;DR: The Gravity Spy project as mentioned in this paper uses similarity indices to empower citizen scientists to create large data sets of unknown transients, which can then be used to facilitate supervised machine-learning characterization, which alleviates a persistent challenge that plagues both citizen-science and instrumental detector work: the ability to build large samples of relatively rare events.
Abstract: The observation of gravitational waves from compact binary coalescences by LIGO and Virgo has begun a new era in astronomy. A critical challenge in making detections is determining whether loud transient features in the data are caused by gravitational waves or by instrumental or environmental sources. The citizen-science project Gravity Spy has been demonstrated as an efficient infrastructure for classifying known types of noise transients (glitches) through a combination of data analysis performed by both citizen volunteers and machine learning. We present the next iteration of this project, using similarity indices to empower citizen scientists to create large data sets of unknown transients, which can then be used to facilitate supervised machine-learning characterization. This new evolution aims to alleviate a persistent challenge that plagues both citizen-science and instrumental detector work: the ability to build large samples of relatively rare events. Using two families of transient noise that appeared unexpectedly during LIGO’s second observing run, we demonstrate the impact that the similarity indices could have had on finding these new glitch types in the Gravity Spy program.

40 citations


Journal ArticleDOI
TL;DR: A new crowdsourcing model and inference procedure is introduced which trains a Gaussian Process classifier using the noisy labels provided by the annotators, and can predict the class of new samples and assess the expertise of the involved annotators.

36 citations


Journal ArticleDOI
05 Dec 2019-PLOS ONE
TL;DR: This work performs a comparison study in the context of Alzheimer's dementia classification using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset with identical neural network architectures and analyzes the benefits of using both modalities in a fusion setting.
Abstract: Automated methods for Alzheimer’s disease (AD) classification have the potential for great clinical benefits and may provide insight for combating the disease. Machine learning, and more specifically deep neural networks, have been shown to have great efficacy in this domain. These algorithms often use neurological imaging data such as MRI and FDG PET, but a comprehensive and balanced comparison of the MRI and amyloid PET modalities has not been performed. In order to accurately determine the relative strength of each imaging variant, this work performs a comparison study in the context of Alzheimer’s dementia classification using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with identical neural network architectures. Furthermore, this work analyzes the benefits of using both modalities in a fusion setting and discusses how these data types may be leveraged in future AD studies using deep learning.

35 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: A differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur is introduced and a deep learning strategy is developed to enhance the results from the first step, which is referred as a residual "denoising" process.
Abstract: Temporal Video Frame Synthesis (TVFS) aims at synthesizing novel frames at timestamps different from existing frames, which has wide applications in video codec, editing and analysis. In this paper, we propose a high frame-rate TVFS framework which takes hybrid input data from a low-speed frame-based sensor and a high-speed event-based sensor. Compared to frame-based sensors, event-based sensors report brightness changes at very high speed, which may well provide useful spatio-temoral information for high frame-rate TVFS. Therefore, we first introduce a differentiable fusion model to approximate the dual-modal physical sensing process, unifying a variety of TVFS scenarios, e.g., interpolation, prediction and motion deblur. Our differentiable model enables iterative optimization of the latent video tensor via autodifferentiation, which propagates the gradients of a loss function defined on the measured data. Our differentiable model-based reconstruction does not involve training, yet is parallelizable and can be implemented on machine learning platforms (such as TensorFlow). Second, we develop a deep learning strategy to enhance the results from the first step, which we refer as a residual "denoising" process. Our trained "denoiser" is beyond Gaussian denoising and shows properties such as contrast enhancement and motion awareness. We show that our framework is capable of handling challenging scenes including both fast motion and strong occlusions.

34 citations


Journal ArticleDOI
TL;DR: This work introduces two scalable and efficient GP-based crowdsourcing methods that allow for processing previously-prohibitive datasets and compares them with state-of-the-art probabilistic approaches in synthetic and real crowdsourcing datasets of different sizes.

17 citations


Journal ArticleDOI
TL;DR: The use of Artificial Neural Networks to model the relationship between the firing rates of single neurons in area 2, a largely proprioceptive region of somatosensory cortex, and several types of kinematic variables related to arm movement provides new insight regarding the complex representations of the limb motion in S1.
Abstract: Somatosensation is composed of two distinct modalities: touch, arising from sensors in the skin, and proprioception, resulting primarily from sensors in the muscles, combined with these same cutaneous sensors. In contrast to the wealth of information about touch, we know surprisingly little about the nature of the signals giving rise to proprioception at the cortical level. Here we investigate the use of Artificial Neural Networks (ANNs) to model the relationship between the firing rates of single neurons in the somatosensory cortex (S1) and several types of kinematic variables related to arm movement. To gain a better understanding of how these kinematic variables interact to create the proprioceptive responses recorded in our datasets, we train ANNs under different conditions, each involving a different set of input and output variables. We find that the addition of information about joint angles and/or muscle lengths significantly improves the prediction of neural firing rates. Our results thus provide new insight regarding the complex representations of the limb motion in S1. In addition, we conduct numerical experiments to determine the sensitivity of ANN models to various choices of training design and hyper-parameters. Our results provide a baseline and new tools for future research that utilizes machine learning to better describe and understand the activity of neurons in S1.

12 citations


Proceedings ArticleDOI
23 Sep 2019
TL;DR: A novel approach based on deep neural network for solving the limited angle tomography problem using deep convolutional generative adversarial networks to fill in the missing information in the sino-gram domain by using the continuity loss and the two-ends method.
Abstract: In this paper, we present a novel approach based on deep neural network for solving the limited angle tomography problem. The limited angle views in tomography cause severe artifacts in the tomographic reconstruction. We use deep convolutional generative adversarial networks (DCGAN) to fill in the missing information in the sino-gram domain. By using the continuity loss and the two-ends method, the image completion in the sinogram domain is done effectively, resulting in high quality reconstructions with fewer artifacts. The sinogram completion method can be applied to different problems such as ring artifact removal and truncated tomography problems.

11 citations


Journal ArticleDOI
TL;DR: This paper proposes the use of the spike-and-slab prior together with an efficient variational Expectation Maximization (EM) inference scheme to estimate the blur in the image and investigates the behavior of the prior in the experimental section.

10 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: An adaptive host-chip system for video acquisition constrained under a given bit rate to optimize object tracking performance is proposed and the optimal QT decomposition is found to minimize a weighted rate distortion equation using the Viterbi algorithm.
Abstract: In this paper, we propose an adaptive host-chip system for video acquisition constrained under a given bit rate to optimize object tracking performance. The chip is an imaging instrument with limited computational power consisting of a very high-resolution focal plane array (FPA) that transmits quadtree (QT)-segmented video frames to the host. The host has unlimited computational power for video analysis. We find the optimal QT decomposition to minimize a weighted rate distortion equation using the Viterbi algorithm. The weights are user-defined based on the class of objects to track. Faster R-CNN and a Kalman filter are used to detect and track the objects of interest respectively. We evaluate our architecture’s performance based on the Multiple Object Tracking Accuracy (MOTA).

7 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: It is shown that adding another step to identify the constituent pigments of a given spectrum leads to more accurate unmixing results, and another deep neural network is used to identify pigments first and integrate this information to different layers of the network used for pigment un mixing.
Abstract: In this paper, the problem of automatic nonlinear unmixing of hyperspectral reflectance data using works of art as test cases is described. We use a deep neural network to decompose a given spectrum quantitatively to the abundance values of pure pigments. We show that adding another step to identify the constituent pigments of a given spectrum leads to more accurate unmixing results. Towards this, we use another deep neural network to identify pigments first and integrate this information to different layers of the network used for pigment unmixing. As a test set, the hyperspectral images of a set of mock-up paintings consisting of a broad palette of pigment mixtures, and pure pigment exemplars, were measured. The results of the algorithm on the mock-up test set are reported and analyzed.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper proposes a fast, efficient post-processing method through fine-tuning that enhances the solution originally provided by the neural network by maintaining its restoration quality while reducing the observed artifacts, as measured qualitatively and quantitatively.
Abstract: While Deep Neural Networks trained for solving inverse imaging problems (such as super-resolution, denoising, or inpainting tasks) regularly achieve new state-of-the-art restoration performance, this increase in performance is often accompanied with undesired artifacts generated in their solution. These artifacts are usually specific to the type of neural network architecture, training, or test input image used for the inverse imaging problem at hand. In this paper, we propose a fast, efficient post-processing method for reducing these artifacts. Given a test input image and its known image formation model, we fine-tune the parameters of the trained network and iteratively update them using a data consistency loss. We show that in addition to being efficient and applicable to large variety of problems, our post-processing through fine-tuning approach enhances the solution originally provided by the neural network by maintaining its restoration quality while reducing the observed artifacts, as measured qualitatively and quantitatively.

Posted Content
TL;DR: This first approach, which is referred to as scalable variational Gaussian processes for crowdsourcing (SVGPCR), brings back GP-based methods to a state-of-the-art level, and excels at uncertainty quantification.
Abstract: In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied to the data acquired by the laureate Laser Interferometer Gravitational Waves Observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it deals with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian Processes (GP), have proven successful in modeling this setting. However, GPs do not scale well to large data sets, which hampers their broad adoption in real practice (in particular at LIGO). This has led to the recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art. However, the accurate uncertainty quantification of GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we leverage the most popular sparse GP approximation to develop a novel GP based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. The approach, which we refer to as Scalable Variational Gaussian Processes for Crowdsourcing (SVGPCR), brings back GP-based methods to the state-of-the-art, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic approaches when applied to the LIGO data. Moreover, its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set.

Proceedings ArticleDOI
01 May 2019
TL;DR: It is shown that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class, and the method achieves performance superior or comparable to similar architectures trained using SGD.
Abstract: We argue that learning a hierarchy of features in a hierarchical dataset requires lower layers to approach convergence faster than layers above them. We show that, if this assumption holds, we can analytically approximate the outcome of stochastic gradient descent (SGD) for each layer. We find that the weights should converge to a class-based PCA, with some weights in every layer dedicated to principal components of each label class. The class-based PCA allows us to train layers directly, without SGD, often leading to a dramatic decrease in training complexity. We demonstrate the effectiveness of this by using our results to replace one and two convolutional layers in networks trained on MNIST, CIFAR10 and CIFAR100 datasets, showing that our method achieves performance superior or comparable to similar architectures trained using SGD.

Book ChapterDOI
17 Apr 2019
TL;DR: There is potential in this processing pipeline to automatically detect parts of the arterial wall which are not normal and possibly consist of plaque, especially in detecting the 3 other classes of plaque.
Abstract: Intravascular Optical Coherence Tomography (IVOCT) is a modality which gives in vivo insight of coronaries’ artery morphology. Thus, it helps diagnosis and prevention of atherosclerosis. About 100–300 cross-sectional OCT images are obtained for each artery. Therefore, it is important to facilitate and objectify the process of detecting regions of interest, which otherwise demand a lot of time and effort from medical experts. We propose a processing pipeline to automatically detect parts of the arterial wall which are not normal and possibly consist of plaque. The first step of the processing is transforming OCT images to polar coordinates and to detect the arterial wall. After binarization of the image and removal of the catheter, the arterial wall is detected in each axial line from the first white pixel to a depth of 80 pixels which is equal to 1.5 mm. Then, the arterial wall is split to orthogonal patches which undergo OCT-specific transformations and are labelled as plaque (4 distinct kinds: fibrous, calcified, lipid and mixed) or normal tissue. OCT-specific transformations include enhancing the more reflective parts of the image and rendering patches independent of the arterial wall curvature. The patches are input to AlexNet which is fine-tuned to learn to classify them. Fine-tuning is performed by retraining an already trained AlexNet with a learning rate which is 20 times larger for the last 3 fully-connected layers than for the initial 5 convolutional layers. 114 cross-sectional images were randomly selected to fine-tune AlexNet while 6 were selected to validate the results. Training accuracy was 100% while validation accuracy was 86%. Drop in validation accuracy rate is attributed mainly to false negatives which concern only calcified plaque. Thus, there is potential in this method especially in detecting the 3 other classes of plaque.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work proposes to pseudo-invert with regularization the image formation model using GANs and perceptual losses and additionally introduces two feature losses which are used to obtain perceptually improved high resolution images.
Abstract: While high and ultra high definition displays are becoming popular, most of the available content has been acquired at much lower resolutions. In this work we propose to pseudo-invert with regularization the image formation model using GANs and perceptual losses. Our model, which does not require the use of motion compensation, utilizes explicitly the low resolution image formation model and additionally introduces two feature losses which are used to obtain perceptually improved high resolution images. The experimental validation shows that our approach outperforms current video super resolution learning based models.

Posted ContentDOI
23 Dec 2019-bioRxiv
TL;DR: A deep learning model is developed to address the heterogeneous nature of DAT development and its prediction value is significantly correlated with rates of change in clinical assessment scores, indicating the model is able to predict an individual patient’s future cognitive decline.
Abstract: Dementia of Alzheimer Type (DAT) is associated with a devastating and irreversible cognitive decline. As a pharmacological intervention has not yet been developed to reverse disease progression, preventive medicine will play a crucial role in patient care and treatment planning. However, predicting which patients will progress to DAT is difficult as patients with Mild Cognitive Impairment (MCI) could either convert to DAT (MCI-C) or not (MCI-NC). In this paper, we develop a deep learning model to address the heterogeneous nature of DAT development. Structural magnetic resonance imaging was utilized as a single biomarker, and a three-dimensional convolutional neural network (3D-CNN) was developed. The 3D-CNN was trained using transfer learning from the classification of Normal Control and DAT scans at the source task. This was applied to the target task of classifying MCI-C and MCI-NC scans. The model results in 82.4% classification accuracy, which outperforms current models in the field. Furthermore, by implementing an occlusion map approach, we visualize key brain regions that significantly contribute to the prediction of MCI-C and MCI-NC. Results show the hippocampus, amygdala, cerebellum, and pons regions as significant to prediction, which is consistent with the current understanding of the disease. Finally, the prediction value of the model is significantly correlated with rates of change in clinical assessment scores, indicating the model is able to predict the future cognitive decline of an individual patient. This information, in conjunction with the identified anatomical features, will aid in building a personalized therapeutic strategy for individuals with MCI. This model could also be useful for selection of participants for clinical trials.

Proceedings ArticleDOI
12 May 2019
TL;DR: This paper aims to train a GAN guided by a spatially adaptive loss function and demonstrates that the learned model achieves improved results with sharper images, fewer artifacts and less noise.
Abstract: Deep Learning techniques and more specifically Generative Adversarial Networks (GANs) have recently been used for solving the video super-resolution (VSR) problem. In some of the published works, feature-based perceptual losses have also been used, resulting in promising results. While there has been work in the literature incorporating temporal information into the loss function, studies which make use of the spatial activity to improve GAN models are still lacking. Towards this end, this paper aims to train a GAN guided by a spatially adaptive loss function. Experimental results demonstrate that the learned model achieves improved results with sharper images, fewer artifacts and less noise.

Proceedings ArticleDOI
12 Jul 2019
TL;DR: In this paper, the authors presented an improved lower-cost time-domain OCT (TD-OCT) system for deeper, high-resolution 3D imaging of painting layers.
Abstract: Accurate measurements of the geometric shape and the internal structure of cultural artifacts are of great importance for the analysis and understanding of artworks such as paintings. Often their complex layers, delicate materials, high value and uniqueness preclude all but the sparsest sample-based measurements (microtomy or embedding of small chips of paint). In the last decade, optical coherence tomography (OCT) has enabled dense point-wise measurements of layered surfaces to create 3D images with axial resolutions at micron scales. Commercial OCT systems at biologically-useful wavelengths (900 nm to 1.3 μm) can reveal some painting layers, strong scattering and absorption at these wavelengths severely limits the penetration depth. While Fourierdomain methods increase measurement speed and eliminate moving parts, they also reduce signal-to-noise ratios and increase equipment costs. In this paper, we present an improved lower-cost time-domain OCT (TD-OCT) system for deeper, high-resolution 3D imaging of painting layers. Assembled entirely from recently-available commercially-made parts, its 2x2 fused fiber-optic coupler forms an interferometer without a delicate, manuallyaligned beam-splitter, its low-cost broadband Q-switched super-continuum laser source supplies 20 KHz 0.4-2.4 μm coherent pulses that penetrate deeply into the sample matrix, and its single low-cost InGaAs amplified photodetector replaces the sensitive spectroscopic camera required by Fourier domain OCT (FD-OCT) systems. Our fiber and filter choices operate at 2.0±0.2 μm wavelengths, as these may later help us characterize scattering and absorption characteristics, and yield axial resolution of about 4.85 μm, surprisingly close to the theoretical maximum of 4.41 μm. We show that despite the moving parts that make TD-OCT measurements more timeconsuming, replacing the spectroscopic camera required by FD-OCT with a single-pixel detector offers strong advantages. This detector measures interference power at all wavelengths simultaneously, but at a single depth, enabling the system to reach its axial resolution limits by simply using more time to acquire more samples per Ascan. We characterize the system performance using material samples that match real works of art. Our system provides an economical and practical way to improve 3D imaging performance for cultural heritage applications in terms of penetration, resolution, and dynamic range.

Book ChapterDOI
01 Jan 2019
TL;DR: A framework based on Convolutional Neural Networks for classification of regions of intravascular OCT images into 4 categories: fibrous tissue, mixed plaque, lipid plaque and calcified plaque is proposed.
Abstract: Intravascular optical coherence tomography (IVOCT) is a light-based imaging modality of great interest because it can contribute in diagnosing and preventing atherosclerosis due to its ability to provide in vivo insight of coronary arteries’ morphology. The substantial number of slices which are obtained per artery, makes it laborious for medical experts to classify image regions of interest. We propose a framework based on Convolutional Neural Networks (CNN) for classification of regions of intravascular OCT images into 4 categories: fibrous tissue, mixed plaque, lipid plaque and calcified plaque. The framework consists of 2 main parts. In the first part, square patches (8 × 8 pixels) of OCT images are classified as fibrous tissue or plaque using a CNN which was designed for texture classification. In the second part, larger regions consisting of adjacent patches which are classified as plaque in the first part, are classified in 3 categories: lipid, calcium, mixed. Region classification is implemented by an AlexNet version re-trained on images artificially constructed to depict only the core of the plaque region which is considered as its blueprint. Various simple steps like thresholding and morphological operations are used through the framework, mainly to exclude background from analysis and to merge patches into regions. The first results are promising since the classification accuracy of the two networks is high (95% and 89% respectively).

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work forms the blind color deconvolution problem within the Bayesian framework and takes into account the similarity to a given reference color-vector matrix and spatial relations among the concentration pixels by a total variation prior.
Abstract: In digital brightfield microscopy, tissues are usually stained with two or more dyes. Color deconvolution aims at separating multi-stained images into single stained images. We formulate the blind color deconvolution problem within the Bayesian framework. Our model takes into account the similarity to a given reference color-vector matrix and spatial relations among the concentration pixels by a total variation prior. It utilizes variational inference and an evidence lower bound to estimate all the latent variables. The proposed algorithm is tested on real images and compared with classical and state-of-the-art color deconvolution algorithms.

Journal ArticleDOI
TL;DR: An inexpensive prototype whose elements are a light table and a consumer-grade photographic camera capable of recovering images of watermarks similar to the ones obtained with standard methods while being a non-destructive, rapid, easy to operate, and inexpensive method is introduced.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: Experimental results show that the proposed semantic prior based Generative Adversarial Network model for video super-resolution is advantageous in sharpening video frames, reducing noise and artifacts, and recovering realistic textures.
Abstract: Semantic information is widely used in the deep learning literature to improve the performance of visual media processing. In this work, we propose a semantic prior based Generative Adversarial Network (GAN) model for video super-resolution. The model fully utilizes various texture styles from different semantic categories of video-frame patches, contributing to more accurate and efficient learning for the generator. Based on the GAN framework, we introduce the semantic prior by making use of the spatial feature transform during the learning process of the generator. The patch-wise semantic prior is extracted on the whole video frame by a semantic segmentation network. A hybrid loss function is designed to guide the learning performance. Experimental results show that our proposed model is advantageous in sharpening video frames, reducing noise and artifacts, and recovering realistic textures.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: The effectiveness of the proposed convolutional neural network (CNN) segmentation network and the benefit of a CNN that combines the physiologically based information, that is, the brain symmetry property are demonstrated.
Abstract: Chronic stroke lesion segmentation on magnetic resonance imaging scans plays a critical role in helping physicians to determine stroke patient prognosis. We propose a convolutional neural network (CNN) segmentation network - a 3D Cross-hemisphere Neighborhood Difference ConvNet -which utilizes brain symmetry. The main novelty of this network lies on a 3D cross-hemisphere neighborhood difference layer which introduces robustness to position and scale in brain symmetry. Such robustness is important in helping the CNN distinguish between minute hemispheric differences and the asymmetry caused by a lesion. We compared our model with the state-of-the-art method using a chronic stroke lesion segmentation database. Our results demonstrate the effectiveness of the proposed model and the benefit of a CNN that combines the physiologically based information, that is, the brain symmetry property.

Proceedings ArticleDOI
12 May 2019
TL;DR: This paper analyzes the ToF sensor’s output as a complex value coupling the depth and intensity information in a phasor representation and introduces a novel multi-frame superresolution technique that can improve both spatial resolution in intensity and depth images simultaneously.
Abstract: Recently, time-of-flight (ToF) sensors have emerged as a promising three-dimensional sensing technology that can be manufactured inexpensively in a compact size. However, current state-of-the-art ToF sensors suffer from low spatial resolution due to physical limitations in the fabrication process. In this paper, we analyze the ToF sensor’s output as a complex value coupling the depth and intensity information in a phasor representation. Based on this analysis, we introduce a novel multi-frame superresolution technique that can improve both spatial resolution in intensity and depth images simultaneously. We believe our proposed method can benefit numerous applications where high resolution depth sensing is desirable, such as precision automated navigation and collision avoidance.

Posted Content
TL;DR: It is demonstrated that it is possible to automatically detect subtle non-apneic/non-hypopneic arousal events from PSG recordings, which contributes to a better retrospective analysis of sleep data, and may also improve the quality of treatment.
Abstract: Objective: The aim of this study is to develop an automated classification algorithm for polysomnography (PSG) recordings to detect non-apneic and non-hypopneic arousals. Our particular focus is on detecting the respiratory effort-related arousals (RERAs) which are very subtle respiratory events that do not meet the criteria for apnea or hypopnea, and are more challenging to detect. Methods: The proposed algorithm is based on a bidirectional long short-term memory (BiLSTM) classifier and 465 multi-domain features, extracted from multimodal clinical time series. The features consist of a set of physiology-inspired features (n = 75), obtained by multiple steps of feature selection and expert analysis, and a set of physiology-agnostic features (n = 390), derived from scattering transform. Results: The proposed algorithm is validated on the 2018 PhysioNet challenge dataset. The overall performance in terms of the area under the precision-recall curve (AUPRC) is 0.50 on the hidden test dataset. This result is tied for the second-best score during the follow-up and official phases of the 2018 PhysioNet challenge. Conclusions: The results demonstrate that it is possible to automatically detect subtle non-apneic/non-hypopneic arousal events from PSG recordings. Significance: Automatic detection of subtle respiratory events such as RERAs together with other non-apneic/non-hypopneic arousals will allow detailed annotations of large PSG databases. This contributes to a better retrospective analysis of sleep data, which may also improve the quality of treatment.

01 Jan 2019
TL;DR: This paper proposes an efficient, fully self-supervised approach to remove the observed artifacts, and applies the method to image and video super-resolution neural networks and shows that the proposed framework consistently enhances the solution originally provided by the neural network.
Abstract: While Deep Neural Networks (DNNs) trained for image and video super-resolution regularly achieve new state-of-the-art performance, they also suffer from significant drawbacks. One of their limitations is their tendency to generate strong artifacts in their solution. This may occur when the low-resolution image formation model does not match that seen during training. Artifacts also regularly arise when training Generative Adversarial Networks for inverse imaging problems. In this paper, we propose an efficient, fully self-supervised approach to remove the observed artifacts. More specifically, at test time, given an image and its known image formation model, we fine-tune the parameters of the trained network and iteratively update them using a data consistency loss. We apply our method to image and video super-resolution neural networks and show that our proposed framework consistently enhances the solution originally provided by the neural network.

Book ChapterDOI
26 Sep 2019
TL;DR: An automatic method to estimate flow rate through the orifice in in-vitro 2D color-flow Doppler echocardiographic images and shows promising results.
Abstract: We present an automatic method to estimate flow rate through the orifice in in-vitro 2D color-flow Doppler echocardiographic images. Flow rate properties are important for the assessment of pathologies like mitral regurgitation. We expect this method to be transferable to in-vivo patient data. The method consists of two main parts: (a) detecting a bounding box which encloses aliasing contours and its surroundings (namely a region representative of flow convergence area), (b) application of Convolutional Neural Networks for regression to estimate the flow convergence area. Best result achieved is the 5% mean error for validation data which is from other experiments that were used for training. Given the small number of training data, this method shows promising results.

Journal ArticleDOI
TL;DR: A novel optical flow prediction model using an adaptable deep neural network architecture for blind and non-blind error concealment of videos degraded by transmission loss is presented.
Abstract: A novel optical flow prediction model using an adaptable deep neural network architecture for blind and non-blind error concealment of videos degraded by transmission loss is presented. The two-stream network model is trained by separating the horizontal and vertical motion fields which are passed through two similar parallel pipelines that include traditional convolutional (Conv) and convolutional long short-term memory (ConvLSTM) layers. The ConvLSTM layers extract temporally correlated motion information while the Conv layers correlate motion spatially. The optical flows used as input to the two-pipeline prediction network are obtained through a flow generation network that can be easily interchanged, increasing the adaptability of the overall end-to-end architecture. The performance of the proposed model is evaluated using real-world packet loss scenarios. Standard video quality metrics are used to compare frames reconstructed using predicted optical flows with those reconstructed using “ground-truth” flows obtained directly from the generator.

Posted Content
TL;DR: In this paper, a self-supervised fine-tuning approach is proposed to correct a sub-optimal super-resolution solution by entirely relying on internal learning at test time.
Abstract: While Convolutional Neural Networks (CNNs) trained for image and video super-resolution (SR) regularly achieve new state-of-the-art performance, they also suffer from significant drawbacks. One of their limitations is their lack of robustness to unseen image formation models during training. Other limitations include the generation of artifacts and hallucinated content when training Generative Adversarial Networks (GANs) for SR. While the Deep Learning literature focuses on presenting new training schemes and settings to resolve these various issues, we show that one can avoid training and correct for SR results with a fully self-supervised fine-tuning approach. More specifically, at test time, given an image and its known image formation model, we fine-tune the parameters of the trained network and iteratively update them using a data fidelity loss. We apply our fine-tuning algorithm on multiple image and video SR CNNs and show that it can successfully correct for a sub-optimal SR solution by entirely relying on internal learning at test time. We apply our method on the problem of fine-tuning for unseen image formation models and on removal of artifacts introduced by GANs.