scispace - formally typeset
Search or ask a question

Showing papers by "Shai Avidan published in 2020"


Journal ArticleDOI
TL;DR: An algorithm based on a non-local prior that recovers the atmospheric light, the distance map and the haze-free image is proposed, which has linear complexity, requires no training, and performs well on a wide variety of images compared to other state-of-the-art methods.
Abstract: Haze often limits visibility and reduces contrast in outdoor images. The degradation varies spatially since it depends on the objects’ distances from the camera. This dependency is expressed in the transmission coefficients, which control the attenuation. Restoring the scene radiance from a single image is a highly ill-posed problem, and thus requires using an image prior. Contrary to methods that use patch-based image priors, we propose an algorithm based on a non-local prior. The algorithm relies on the assumption that colors of a haze-free image are well approximated by a few hundred distinct colors, which form tight clusters in RGB space. Our key observation is that pixels in a given cluster are often non-local, i.e., spread over the entire image plane and located at different distances from the camera. In the presence of haze these varying distances translate to different transmission coefficients. Therefore, each color cluster in the clear image becomes a line in RGB space, that we term a haze-line. Using these haze-lines, our algorithm recovers the atmospheric light, the distance map and the haze-free image. The algorithm has linear complexity, requires no training, and performs well on a wide variety of images compared to other state-of-the-art methods.

130 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A new method for anomaly detection of human actions that works directly on human pose graphs that can be computed from an input video sequence, and performs considerably better than other state of the art methods.
Abstract: We propose a new method for anomaly detection of human actions. Our method works directly on human pose graphs that can be computed from an input video sequence. This makes the analysis independent of nuisance parameters such as viewpoint or illumination. We map these graphs to a latent space and cluster them. Each action is then represented by its soft-assignment to each of the clusters. This gives a kind of ”bag of words” representation to the data, where every action is represented by its similarity to a group of base action-words. Then, we use a Dirichlet process based mixture, that is useful for handling proportional data such as our soft-assignment vectors, to determine if an action is normal or not. We evaluate our method on two types of data sets. The first is a fine-grained anomaly detection data set (e.g. ShanghaiTech) where we wish to detect unusual variations of some action. The second is a coarse-grained anomaly detection data set (e.g., a Kinetics-based data set) where few actions are considered normal, and every other action should be considered abnormal. Extensive experiments on the benchmarks show that our method1performs considerably better than other state of the art methods.

114 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work introduces a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud and outperforms existing non-learned and learned sampling alternatives.
Abstract: There is a growing number of tasks that work directly on point clouds. As the size of the point cloud grows, so do the computational demands of these tasks. A possible solution is to sample the point cloud first. Classic sampling approaches, such as farthest point sampling (FPS), do not consider the downstream task. A recent work showed that learning a task-specific sampling can improve results significantly. However, the proposed technique did not deal with the non-differentiability of the sampling operation and offered a workaround instead. We introduce a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud. Our approximation scheme leads to consistently good results on classification and geometry reconstruction applications. We also show that the proposed sampling method can be used as a front to a point cloud registration network. This is a challenging task since sampling must be consistent across two different point clouds for a shared downstream task. In all cases, our approach outperforms existing non-learned and learned sampling alternatives. Our code is publicly available.

113 citations


Book ChapterDOI
23 Aug 2020
TL;DR: In this paper, a deep image compression neural network is proposed that relies on side information which is only available to the decoder, based on the assumption that the image available to both encoder and decoder are correlated and the network learns these correlations in the training phase.
Abstract: We present a Deep Image Compression neural network that relies on side information, which is only available to the decoder. We base our algorithm on the assumption that the image available to the encoder and the image available to the decoder are correlated, and we let the network learn these correlations in the training phase.

16 citations


Posted Content
TL;DR: This work is the first to consider the problem of adversarial examples at a geometric level, and demonstrates the robustness of the attack in the case of defense, where it is shown that remnant characteristics of the target shape are still present at the output after applying the defense to the adversarial input.
Abstract: Deep neural networks are prone to adversarial examples that maliciously alter the network's outcome. Due to the increasing popularity of 3D sensors in safety-critical systems and the vast deployment of deep learning models for 3D point sets, there is a growing interest in adversarial attacks and defenses for such models. So far, the research has focused on the semantic level, namely, deep point cloud classifiers. However, point clouds are also widely used in a geometric-related form that includes encoding and reconstructing the geometry. In this work, we are the first to consider the problem of adversarial examples at a geometric level. In this setting, the question is how to craft a small change to a clean source point cloud that leads, after passing through an autoencoder model, to the reconstruction of a different target shape. Our attack is in sharp contrast to existing semantic attacks on 3D point clouds. While such works aim to modify the predicted label by a classifier, we alter the entire reconstructed geometry. Additionally, we demonstrate the robustness of our attack in the case of defense, where we show that remnant characteristics of the target shape are still present at the output after applying the defense to the adversarial input. Our code is publicly available at this https URL.

14 citations


Proceedings ArticleDOI
01 Apr 2020
TL;DR: This work focuses on robust estimation of the water properties, and as opposed to previous methods that used fixed values for attenuation, estimates the veiling-light color from objects in the scene, contrary to looking at background pixels.
Abstract: The appearance of underwater scenes is highly governed by the optical properties of the water (attenuation and scattering). However, most research effort in physics-based underwater image reconstruction methods is placed on devising image priors for estimating scene transmission, and less on estimating the optical properties. This limits the quality of the results. This work focuses on robust estimation of the water properties. First, as opposed to previous methods that used fixed values for attenuation, we estimate it from the color distribution in the image. Second, we estimate the veiling-light color from objects in the scene, contrary to looking at background pixels. We conduct an extensive qualitative and quantitative evaluation of our method vs. most recent methods on several datasets. As our estimation is more robust our method provides superior results including on challenging scenes.

10 citations


Posted Content
TL;DR: In this paper, a family of novel frequency-domain utilization networks is presented, which utilize the inherent efficiency of the frequency domain by working directly in that domain, represented with the Discrete Cosine Transform.
Abstract: The search for efficient neural network architectures has gained much focus in recent years, where modern architectures focus not only on accuracy but also on inference time and model size. Here, we present FUN, a family of novel Frequency-domain Utilization Networks. These networks utilize the inherent efficiency of the frequency-domain by working directly in that domain, represented with the Discrete Cosine Transform. Using modern techniques and building blocks such as compound-scaling and inverted-residual layers we generate a set of such networks allowing one to balance between size, latency and accuracy while outperforming competing RGB-based models. Extensive evaluations verifies that our networks present strong alternatives to previous approaches. Moreover, we show that working in frequency domain allows for dynamic compression of the input at inference time without any explicit change to the architecture.

5 citations


Posted Content
TL;DR: Several algorithms, collectively named Best Buddy Registration (BBR), are presented, where each algorithm consists of optimizing one of these loss functions with Adam gradient descent, inspired by the Best Buddies Similarity measure that counts the number of mutual nearest neighbors between two point sets.
Abstract: We propose new, and robust, loss functions for the point cloud registration problem. Our loss functions are inspired by the Best Buddies Similarity (BBS) measure that counts the number of mutual nearest neighbors between two point sets. This measure has been shown to be robust to outliers and missing data in the case of template matching for images. We present several algorithms, collectively named Best Buddy Registration (BBR), where each algorithm consists of optimizing one of these loss functions with Adam gradient descent. The loss functions differ in several ways, including the distance function used (point-to-point vs. point-to-plane), and how the BBS measure is combined with the actual distances between pairs of points. Experiments on various data sets, both synthetic and real, demonstrate the effectiveness of the BBR algorithms, showing that they are quite robust to noise, outliers, and distractors, and cope well with extremely sparse point clouds. One variant, BBR-F, achieves state-of-the-art accuracy in the registration of automotive lidar scans taken up to several seconds apart, from the KITTI and Apollo-Southbay datasets.

4 citations


Proceedings Article
01 Jan 2020
TL;DR: In this article, the authors investigated the classification performance of K-NN and deep neural networks (DNNs) in the presence of label noise and derived a realizable analytic expression that approximates the multi-class KNN classification error.
Abstract: We investigate the classification performance of K-nearest neighbors (K-NN) and deep neural networks (DNNs) in the presence of label noise. We first show empirically that a DNN's prediction for a given test example depends on the labels of the training examples in its local neighborhood. This motivates us to derive a realizable analytic expression that approximates the multi-class K-NN classification error in the presence of label noise, which is of independent importance. We then suggest that the expression for K-NN may serve as a first-order approximation for the DNN error. Finally, we demonstrate empirically the proximity of the developed expression to the observed performance of K-NN and DNN classifiers. Our result may explain the already observed surprising resistance of DNN to some types of label noise. It also characterizes an important factor of it showing that the more concentrated the noise the greater is the degradation in performance.

3 citations


Proceedings ArticleDOI
01 Apr 2020
TL;DR: NLDNet++, a fully convolutional network that is trained on pairs of hazy images and images dehazed by NLD++, is proposed, which eliminates the need of existing deep learning methods that require hazy/dehazed image pairs that are difficult to obtain.
Abstract: Deep learning methods for image dehazing achieve impressive results. Yet, the task of collecting ground truth hazy/dehazed image pairs to train the network is cumbersome. We propose to use Non-Local Image Dehazing (NLD), an existing physics based technique, to provide the dehazed image required to training a network. Upon close inspection, we find that NLD suffers from several shortcomings and propose novel extensions to improve it. The new method, termed NLD++, consists of 1) denoising the input image as pre-processing step to avoid noise amplification, 2) introducing a constrained optimization that respects physical constraints. NLD++ produces superior results to NLD at the expense of increased computational cost. To offset that, we propose NLDNet++, a fully convolutional network that is trained on pairs of hazy images and images dehazed by NLD++. This eliminates the need of existing deep learning methods that require hazy/dehazed image pairs that are difficult to obtain. We evaluate the performance of NLDNet++ on standard data sets and find it to compare favorably with existing methods.

2 citations


Posted Content
TL;DR: A Deep Image Compression neural network that relies on side information, which is only available to the decoder, that is compared to several image compression algorithms and shows that adding decoder-only side information does indeed improve results.
Abstract: We present a Deep Image Compression neural network that relies on side information, which is only available to the decoder. We base our algorithm on the assumption that the image available to the encoder and the image available to the decoder are correlated, and we let the network learn these correlations in the training phase. Then, at run time, the encoder side encodes the input image without knowing anything about the decoder side image and sends it to the decoder. The decoder then uses the encoded input image and the side information image to reconstruct the original image. This problem is known as Distributed Source Coding in Information Theory, and we discuss several use cases for this technology. We compare our algorithm to several image compression algorithms and show that adding decoder-only side information does indeed improve results. Our code is publicly available at this https URL.

Posted Content
TL;DR: This work proposes a fully convolutional generative adversarial network, conditioned locally on co-occurrence statistics, to generate arbitrarily large images while having local, interpretable control over texture appearance.
Abstract: As image generation techniques mature, there is a growing interest in explainable representations that are easy to understand and intuitive to manipulate. In this work, we turn to co-occurrence statistics, which have long been used for texture analysis, to learn a controllable texture synthesis model. We propose a fully convolutional generative adversarial network, conditioned locally on co-occurrence statistics, to generate arbitrarily large images while having local, interpretable control over the texture appearance. To encourage fidelity to the input condition, we introduce a novel differentiable co-occurrence loss that is integrated seamlessly into our framework in an end-to-end fashion. We demonstrate that our solution offers a stable, intuitive and interpretable latent representation for texture synthesis, which can be used to generate a smooth texture morph between different textures. We further show an interactive texture tool that allows a user to adjust local characteristics of the synthesized texture image using the co-occurrence values directly.

Posted Content
TL;DR: A binary embedding framework, called Proximity Preserving Code (PPC), which learns similarity and dissimilarity between data points to create a compact and affinity-preserving binary code, which can be used to apply fast and memory-efficient approximation to nearest-neighbor searches.
Abstract: We introduce a binary embedding framework, called Proximity Preserving Code (PPC), which learns similarity and dissimilarity between data points to create a compact and affinity-preserving binary code. This code can be used to apply fast and memory-efficient approximation to nearest-neighbor searches. Our framework is flexible, enabling different proximity definitions between data points. In contrast to previous methods that extract binary codes based on unsigned graph partitioning, our system models the attractive and repulsive forces in the data by incorporating positive and negative graph weights. The proposed framework is shown to boil down to finding the minimal cut of a signed graph, a problem known to be NP-hard. We offer an efficient approximation and achieve superior results by constructing the code bit after bit. We show that the proposed approximation is superior to the commonly used spectral methods with respect to both accuracy and complexity. Thus, it is useful for many other problems that can be translated into signed graph cut.

Book ChapterDOI
30 Nov 2020
TL;DR: In this paper, the authors proposed new loss functions for the point cloud registration problem, inspired by the Best Buddies Similarity (BBS) measure that counts the number of mutual nearest neighbors between two point sets.
Abstract: We propose new, and robust, loss functions for the point cloud registration problem. Our loss functions are inspired by the Best Buddies Similarity (BBS) measure that counts the number of mutual nearest neighbors between two point sets. This measure has been shown to be robust to outliers and missing data in the case of template matching for images. We present several algorithms, collectively named Best Buddy Registration (BBR), where each algorithm consists of optimizing one of these loss functions with Adam gradient descent. The loss functions differ in several ways, including the distance function used (point-to-point vs. point-to-plane), and how the BBS measure is combined with the actual distances between pairs of points. Experiments on various data sets, both synthetic and real, demonstrate the effectiveness of the BBR algorithms, showing that they are quite robust to noise, outliers, and distractors, and cope well with extremely sparse point clouds. One variant, BBR-F, achieves state-of-the-art accuracy in the registration of automotive lidar scans taken up to several seconds apart, from the KITTI and Apollo-Southbay datasets.