scispace - formally typeset
Search or ask a question
Author

Atilla Baskurt

Bio: Atilla Baskurt is an academic researcher from University of Lyon. The author has contributed to research in topics: Digital watermarking & Image segmentation. The author has an hindex of 27, co-authored 178 publications receiving 3543 citations. Previous affiliations of Atilla Baskurt include Institut national des sciences Appliquées de Lyon & French Institute of Health and Medical Research.


Papers
More filters
Book ChapterDOI
16 Nov 2011
TL;DR: A fully automated deep model, which learns to classify human actions without using any prior knowledge is proposed, which outperforms existing deep models, and gives comparable results with the best related works.
Abstract: We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-temporal features. A Recurrent Neural Network is then trained to classify each sequence considering the temporal evolution of the learned features for each timestep. Experimental results on the KTH dataset show that the proposed approach outperforms existing deep models, and gives comparable results with the best related works.

788 citations

Journal ArticleDOI
TL;DR: A new and efficient algorithm for the decomposition of 3D arbitrary triangle meshes and particularly optimized triangulated CAD meshes based on the curvature tensor field analysis is presented, which decomposes the object into near constant curvature patches and corrects boundaries by suppressing their artefacts or discontinuities.
Abstract: This paper presents a new and efficient algorithm for the decomposition of 3D arbitrary triangle meshes and particularly optimized triangulated CAD meshes. The algorithm is based on the curvature tensor field analysis and presents two distinct complementary steps: a region based segmentation, which is an improvement of that presented by Lavoue et al. [Lavoue G, Dupont F, Baskurt A. Constant curvature region decomposition of 3D-meshes by a mixed approach vertex-triangle, J WSCG 2004;12(2):245-52] and which decomposes the object into near constant curvature patches, and a boundary rectification based on curvature tensor directions, which corrects boundaries by suppressing their artefacts or discontinuities. Experiments conducted on various models including both CAD and natural objects, show satisfactory results. Resulting segmented patches, by virtue of their properties (homogeneous curvature, clean boundaries) are particularly adapted to computer graphics tasks like parametric or subdivision surface fitting in an adaptive compression objective.

219 citations

Journal ArticleDOI
TL;DR: This paper gives a comprehensive survey on 3-D mesh watermarking, which is considered an effective solution to the above two emerging problems.
Abstract: Three-dimensional (3-D) meshes have been used more and more in industrial, medical and entertainment applications during the last decade. Many researchers, from both the academic and the industrial sectors, have become aware of their intellectual property protection and authentication problems arising with their increasing use. This paper gives a comprehensive survey on 3-D mesh watermarking, which is considered an effective solution to the above two emerging problems. Our survey covers an introduction to the relevant state of the art, an attack-centric investigation, and a list of existing problems and potential solutions. First, the particular difficulties encountered while applying watermarking on 3-D meshes are discussed. Then we give a presentation and an analysis of the existing algorithms by distinguishing them between fragile techniques and robust techniques. Since attacks play an important role in the design of 3-D mesh watermarking algorithms, we also provide an attack-centric viewpoint of this state of the art. Finally, some future working directions are pointed out especially on the ways of devising robust and blind algorithms and on some new probably promising watermarking feature spaces.

163 citations

Proceedings ArticleDOI
TL;DR: An objective structural distortion measure which reflects the visual similarity between 3D meshes and thus can be used for quality assessment and its strong correlation with subjective ratings is presented.
Abstract: This paper presents an objective structural distortion measure which reflects the visual similarity between 3D meshes and thus can be used for quality assessment. The proposed tool is not linked to any specific application and thus can be used to evaluate any kinds of 3D mesh processing algorithms (simplification, compression, watermarking etc.). This measure follows the concept of structural similarity recently introduced for 2D image quality assessment by Wang et al.1 and is based on curvature analysis (mean, standard deviation, covariance) on local windows of the meshes. Evaluation and comparison with geometric metrics are done through a subjective experiment based on human evaluation of a set of distorted objects. A quantitative perceptual metric is also derived from the proposed structural distortion measure, for the specific case of watermarking quality assessment, and is compared with recent state of the art algorithms. Both visual and quantitative results demonstrate the robustness of our approach and its strong correlation with subjective ratings.

161 citations

Journal ArticleDOI
TL;DR: An improvement on the adaptivity is proposed by introducing an enhancement to control the adaptive properties of the segmentation process, which takes the form of a weighting function accounting for both local and global statistics, and is introduced in the minimisation.

159 citations


Cited by
More filters
Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Abstract: Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

4,876 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are “doubly deep” in that they can be compositional in spatial and temporal “layers”. Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

4,206 citations

Posted Content
TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

3,935 citations

Book ChapterDOI
08 Oct 2016
TL;DR: This paper proposes a new supervision signal, called center loss, for face recognition task, which simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers.
Abstract: Convolutional neural networks (CNNs) have been widely used in computer vision community, significantly improving the state-of-the-art. In most of the available CNNs, the softmax loss function is used as the supervision signal to train the deep model. In order to enhance the discriminative power of the deeply learned features, this paper proposes a new supervision signal, called center loss, for face recognition task. Specifically, the center loss simultaneously learns a center for deep features of each class and penalizes the distances between the deep features and their corresponding class centers. More importantly, we prove that the proposed center loss function is trainable and easy to optimize in the CNNs. With the joint supervision of softmax loss and center loss, we can train a robust CNNs to obtain the deep features with the two key learning objectives, inter-class dispension and intra-class compactness as much as possible, which are very essential to face recognition. It is encouraging to see that our CNNs (with such joint supervision) achieve the state-of-the-art accuracy on several important face recognition benchmarks, Labeled Faces in the Wild (LFW), YouTube Faces (YTF), and MegaFace Challenge. Especially, our new approach achieves the best results on MegaFace (the largest public domain face benchmark) under the protocol of small training set (contains under 500000 images and under 20000 persons), significantly improving the previous results and setting new state-of-the-art for both face recognition and face verification tasks.

3,464 citations