scispace - formally typeset
Search or ask a question
Author

Roee Litman

Bio: Roee Litman is an academic researcher from Tel Aviv University. The author has contributed to research in topics: Shape analysis (digital geometry) & Benchmark (computing). The author has an hindex of 17, co-authored 34 publications receiving 1334 citations. Previous affiliations of Roee Litman include Technion – Israel Institute of Technology & Amazon.com.

Papers
More filters
Journal ArticleDOI
TL;DR: A learning scheme for the construction of optimized spectral descriptors is shown and related to Mahalanobis metric learning, which shows the superiority of the proposed approach in generating correspondences is demonstrated on synthetic and scanned human figures.
Abstract: Informative and discriminative feature descriptors play a fundamental role in deformable shape analysis. For example, they have been successfully employed in correspondence, registration, and retrieval tasks. In recent years, significant attention has been devoted to descriptors obtained from the spectral decomposition of the Laplace-Beltrami operator associated with the shape. Notable examples in this family are the heat kernel signature (HKS) and the recently introduced wave kernel signature (WKS). The Laplacian-based descriptors achieve state-of-the-art performance in numerous shape analysis tasks; they are computationally efficient, isometry-invariant by construction, and can gracefully cope with a variety of transformations. In this paper, we formulate a generic family of parametric spectral descriptors. We argue that to be optimized for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied (the "signal") and those of the class of transformations to which it is made insensitive (the "noise"). While such statistics are hard to model axiomatically, they can be learned from examples. Following the spirit of the Wiener filter in signal processing, we show a learning scheme for the construction of optimized spectral descriptors and relate it to Mahalanobis metric learning. The superiority of the proposed approach in generating correspondences is demonstrated on synthetic and scanned human figures. We also show that the learned descriptors are robust enough to be learned on synthetic data and transferred successfully to scanned shapes.

193 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work generalizes to surfaces the polar sampling of the image domain used in shape contexts and can leverage recent developments in intrinsic shape analysis and construct ISC based on state-of-the-art dense shape descriptors such as heat kernel signatures.
Abstract: In this work, we present intrinsic shape context (ISC) descriptors for 3D shapes. We generalize to surfaces the polar sampling of the image domain used in shape contexts: for this purpose, we chart the surface by shooting geodesic outwards from the point being analyzed; ‘angle’ is treated as tantamount to geodesic shooting direction, and radius as geodesic distance. To deal with orientation ambiguity, we exploit properties of the Fourier transform. Our charting method is intrinsic, i.e., invariant to isometric shape transformations. The resulting descriptor is a meta-descriptor that can be applied to any photometric or geometric property field defined on the shape, in particular, we can leverage recent developments in intrinsic shape analysis and construct ISC based on state-of-the-art dense shape descriptors such as heat kernel signatures. Our experiments demonstrate a notable improvement in shape matching on standard benchmarks.

183 citations

Proceedings ArticleDOI
10 Apr 2011
TL;DR: A benchmark that simulates the feature detection and description stages of feature-based shape retrieval algorithms under a wide variety of transformations is presented.
Abstract: Feature-based approaches have recently become very popular in computer vision and image analysis applications, and are becoming a promising direction in shape retrieval. SHREC'11 robust feature detection and description benchmark simulates the feature detection and description stages of feature-based shape retrieval algorithms. The benchmark tests the performance of shape feature detectors and descriptors under a wide variety of transformations. The benchmark allows evaluating how algorithms cope with certain classes of transformations and strength of the transformations that can be dealt with. The present paper is a report of the SHREC'11 robust feature detection and description benchmark results

169 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work derives the proposed recovery technique capable of guaranteeing a bijective correspondence and producing significantly higher accuracy and smoothness from the statistical framework of kernel density estimation and demonstrates its performance on several challenging deformable 3D shape matching datasets.
Abstract: Many algorithms for the computation of correspondences between deformable shapes rely on some variant of nearest neighbor matching in a descriptor space. Such are, for example, various point-wise correspondence recovery algorithms used as a post-processing stage in the functional correspondence framework. Such frequently used techniques implicitly make restrictive assumptions (e.g., nearisometry) on the considered shapes and in practice suffer from lack of accuracy and result in poor surjectivity. We propose an alternative recovery technique capable of guaranteeing a bijective correspondence and producing significantly higher accuracy and smoothness. Unlike other methods our approach does not depend on the assumption that the analyzed shapes are isometric. We derive the proposed method from the statistical framework of kernel density estimation and demonstrate its performance on several challenging deformable 3D shape matching datasets.

134 citations

Proceedings ArticleDOI
Ron Litman1, Oron Anschel1, Shahar Tsiper1, Roee Litman1, Shai Mazor1, R. Manmatha1 
14 Jun 2020
TL;DR: A novel architecture for STR is introduced, named Selective Context ATtentional Text Recognizer (SCATTER), that utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies.
Abstract: Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7% on average.

111 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions) and are natural targets for machine-learning techniques as mentioned in this paper.
Abstract: Many scientific fields study data with an underlying structure that is non-Euclidean. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions) and are natural targets for machine-learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural-language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure and in cases where the invariances of these structures are built into networks used to model them.

2,565 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this article, a CNN architecture is proposed to combine information from multiple views of a 3D shape into a single and compact shape descriptor, which can be applied to accurately recognize human hand-drawn sketches of shapes.
Abstract: A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images We first present a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors Recognition rates further increase when multiple views of the shapes are provided In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives

2,195 citations

01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: In this article, a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features is proposed.
Abstract: Deep learning has achieved a remarkable performance breakthrough in several fields, most notably in speech recognition, natural language processing, and computer vision. In particular, convolutional neural network (CNN) architectures currently produce state-of-the-art performance on a variety of image analysis tasks such as object detection and recognition. Most of deep learning research has so far focused on dealing with 1D, 2D, or 3D Euclidean-structured data such as acoustic signals, images, or videos. Recently, there has been an increasing interest in geometric deep learning, attempting to generalize deep learning methods to non-Euclidean structured data such as graphs and manifolds, with a variety of applications from the domains of network analysis, computational social science, or computer graphics. In this paper, we propose a unified framework allowing to generalize CNN architectures to non-Euclidean domains (graphs and manifolds) and learn local, stationary, and compositional task-specific features. We show that various non-Euclidean CNN methods previously proposed in the literature can be considered as particular instances of our framework. We test the proposed method on standard tasks from the realms of image-, graph-and 3D shape analysis and show that it consistently outperforms previous approaches.

1,594 citations

Posted Content
TL;DR: This work presents a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and shows that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art3D shape descriptors.
Abstract: A longstanding question in computer vision concerns the representation of 3D shapes for recognition: should 3D shapes be represented with descriptors operating on their native 3D formats, such as voxel grid or polygon mesh, or can they be effectively represented with view-based descriptors? We address this question in the context of learning to recognize 3D shapes from a collection of their rendered views on 2D images. We first present a standard CNN architecture trained to recognize the shapes' rendered views independently of each other, and show that a 3D shape can be recognized even from a single view at an accuracy far higher than using state-of-the-art 3D shape descriptors. Recognition rates further increase when multiple views of the shapes are provided. In addition, we present a novel CNN architecture that combines information from multiple views of a 3D shape into a single and compact shape descriptor offering even better recognition performance. The same architecture can be applied to accurately recognize human hand-drawn sketches of shapes. We conclude that a collection of 2D views can be highly informative for 3D shape recognition and is amenable to emerging CNN architectures and their derivatives.

1,508 citations