Author
Danda Pani Paudel
Other affiliations: Centre national de la recherche scientifique, University of Burgundy
Bio: Danda Pani Paudel is an academic researcher from ETH Zurich. The author has contributed to research in topics: Computer science & RANSAC. The author has an hindex of 11, co-authored 72 publications receiving 533 citations. Previous affiliations of Danda Pani Paudel include Centre national de la recherche scientifique & University of Burgundy.
Papers
More filters
••
01 Jun 2018TL;DR: In this paper, a manifold network structure was used for covariance pooling to improve facial expression recognition. And the authors achieved a recognition accuracy of 58.14% on Static Facial Expressions in the Wild (SFEW2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database.
Abstract: Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial features. In this work, we explore the benefits of using a manifold network structure for covariance pooling to improve facial expression recognition. In particular, we first employ such kind of manifold networks in conjunction with traditional convolutional networks for spatial pooling within individual image feature maps in an end-to-end deep learning manner. By doing so, we are able to achieve a recognition accuracy of 58.14% on the validation set of Static Facial Expressions in the Wild (SFEW2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database1. Both of these results are the best results we are aware of. Besides, we leverage covariance pooling to capture the temporal evolution of per-frame features for video-based facial expression recognition. Our reported results demonstrate the advantage of pooling image-set features temporally by stacking the designed manifold network of covariance pooling on top of convolutional network layers.
157 citations
••
15 Jun 2019TL;DR: In this article, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate, and instead of using a large number of random projections, as it is done by conventional SWD approximation methods, they propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion.
Abstract: In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.
95 citations
••
University of Ljubljana1, Czech Technical University in Prague2, Huawei3, ETH Zurich4, Austrian Institute of Technology5, University of Birmingham6, Linköping University7, University of British Columbia8, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir9, Eskişehir Osmangazi University10, Dalian University of Technology11, Kyushu University12, Nanjing University13, Indian Institutes of Technology14, University of Udine15, University of Surrey16, Chinese Academy of Sciences17, Nanyang Technological University18, Jiangnan University19, Fuzhou University20, Zayed University21, Hong Kong Baptist University22, Kyungpook National University23, The Chinese University of Hong Kong24, RWTH Aachen University25, University of Isfahan26, Microsoft27, Anadolu University28, Ain Shams University29, University of Würzburg30, University of Oxford31, Tamkang University32, Katholieke Universiteit Leuven33, University of Science and Technology of China34, Huazhong University of Science and Technology35, Sun Yat-sen University36, Southeast University37, Tianjin University of Technology38, Nanjing University of Information Science and Technology39, ShanghaiTech University40
60 citations
•
04 Oct 2018TL;DR: This work exploits the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation, and introduces a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution.
Abstract: The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, the generation becomes increasingly challenging with the increase of the resolution/duration of videos. In this work, we exploit the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation. In particular, we begin to produce video samples of low-resolution and short-duration, and then progressively increase both resolution and duration alone (or jointly) by adding new spatiotemporal convolutional layers to the current networks. Starting from the learning on a very raw-level spatial appearance and temporal movement of the video distribution, the proposed progressive method learns spatiotemporal information incrementally to generate higher resolution videos. Furthermore, we introduce a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution. SWGAN loss replaces the distance between joint distributions by that of one-dimensional marginal distributions, making the loss easier to compute. We evaluate the proposed model on our collected face video dataset of 10,900 videos to generate photorealistic face videos of 256x256x32 resolution. In addition, our model also reaches a record inception score of 14.57 in unsupervised action recognition dataset UCF-101.
56 citations
•
01 Jan 2021TL;DR: In this article, a learned association network is introduced to propagate the identities of all target candidates from frame-to-frame, allowing to track distractor objects from frame to frame.
Abstract: The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach.
We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term dataset.
47 citations
Cited by
More filters
•
3,940 citations
01 Jan 2016
TL;DR: The the senses considered as perceptual systems is universally compatible with any devices to read, and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you for downloading the senses considered as perceptual systems. Maybe you have knowledge that, people have search hundreds times for their favorite novels like this the senses considered as perceptual systems, but end up in infectious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some malicious bugs inside their desktop computer. the senses considered as perceptual systems is available in our book collection an online access to it is set as public so you can get it instantly. Our books collection hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the the senses considered as perceptual systems is universally compatible with any devices to read.
854 citations
••
TL;DR: A comprehensive survey on deep facial expression recognition (FER) can be found in this article, including datasets and algorithms that provide insights into the intrinsic problems of deep FER, including overfitting caused by lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias.
Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.
712 citations
•
TL;DR: A new approach to visual navigation under changing conditions dubbed SeqSLAM, which removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images.
Abstract: Learning and then recognizing a route, whether travelled during the day or at night, in clear or inclement weather, and in summer or winter is a challenging task for state of the art algorithms in computer vision and robotics. In this paper, we present a new approach to visual navigation under changing conditions dubbed SeqSLAM. Instead of calculating the single location most likely given a current image, our approach calculates the best candidate matching location within every local navigation sequence. Localization is then achieved by recognizing coherent sequences of these “local best matches”. This approach removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images. The approach is applicable over environment changes that render traditional feature-based techniques ineffective. Using two car-mounted camera datasets we demonstrate the effectiveness of the algorithm and compare it to one of the most successful feature-based SLAM algorithms, FAB-MAP. The perceptual change in the datasets is extreme; repeated traverses through environments during the day and then in the middle of the night, at times separated by months or years and in opposite seasons, and in clear weather and extremely heavy rain. While the feature-based method fails, the sequence-based algorithm is able to match trajectory segments at 100% precision with recall rates of up to 60%.
686 citations
••
01 Jun 2006
TL;DR: An apposite and eminently readable reference for all behavioral science research and development.
Abstract: An apposite and eminently readable reference for all behavioral science research and development
649 citations