scispace - formally typeset
Search or ask a question
Author

Danda Pani Paudel

Bio: Danda Pani Paudel is an academic researcher from ETH Zurich. The author has contributed to research in topics: Computer science & RANSAC. The author has an hindex of 11, co-authored 72 publications receiving 533 citations. Previous affiliations of Danda Pani Paudel include Centre national de la recherche scientifique & University of Burgundy.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 Jun 2018
TL;DR: In this paper, a manifold network structure was used for covariance pooling to improve facial expression recognition. And the authors achieved a recognition accuracy of 58.14% on Static Facial Expressions in the Wild (SFEW2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database.
Abstract: Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial features. In this work, we explore the benefits of using a manifold network structure for covariance pooling to improve facial expression recognition. In particular, we first employ such kind of manifold networks in conjunction with traditional convolutional networks for spatial pooling within individual image feature maps in an end-to-end deep learning manner. By doing so, we are able to achieve a recognition accuracy of 58.14% on the validation set of Static Facial Expressions in the Wild (SFEW2.0) and 87.0% on the validation set of Real-World Affective Faces (RAF) Database1. Both of these results are the best results we are aware of. Besides, we leverage covariance pooling to capture the temporal evolution of per-frame features for video-based facial expression recognition. Our reported results demonstrate the advantage of pooling image-set features temporally by stacking the designed manifold network of covariance pooling on top of convolutional network layers.

157 citations

Proceedings ArticleDOI
Jiqing Wu1, Zhiwu Huang1, Dinesh Acharya1, Wen Li1, Janine Thoma1, Danda Pani Paudel1, Luc Van Gool1 
15 Jun 2019
TL;DR: In this article, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate, and instead of using a large number of random projections, as it is done by conventional SWD approximation methods, they propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion.
Abstract: In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In contrast, the sliced Wasserstein distance (SWD) factorizes high-dimensional distributions into their multiple one-dimensional marginal distributions and is thus easier to approximate. In this paper, we introduce novel approximations of the primal and dual SWD. Instead of using a large number of random projections, as it is done by conventional SWD approximation methods, we propose to approximate SWDs with a small number of parameterized orthogonal projections in an end-to-end deep learning fashion. As concrete applications of our SWD approximations, we design two types of differentiable SWD blocks to equip modern generative frameworks---Auto-Encoders (AE) and Generative Adversarial Networks (GAN). In the experiments, we not only show the superiority of the proposed generative models on standard image synthesis benchmarks, but also demonstrate the state-of-the-art performance on challenging high resolution image and video generation in an unsupervised manner.

95 citations

Proceedings ArticleDOI
Matej Kristan1, Jiri Matas2, Ales Leonardis3, Michael Felsberg4, Roman Pflugfelder5, Joni-Kristian Kamarainen6, Hyung Jin Chang6, Martin Danelljan4, Luka Čehovin Zajc1, Alan Lukezic1, Ondrej Drbohlav2, Jani Käpylä, Gustav Häger7, Song Yan, Jinyu Yang6, Zhongqun Zhang, Gustavo Fernandez5, Mohamed H. Abdelpakey8, Goutam Bhat9, Llukman Cerkezi, Hakan Cevikalp10, Shengyong Chen, Xin Chen11, Miao Cheng, Ziyi Cheng12, Yu-Chen Chiu, Ozgun Cirakman, Yutao Cui13, Kenan Dai11, Mohana Murali Dasari14, Qili Deng, Xingping Dong, Daniel K. Du, Matteo Dunnhofer15, Zhen-Hua Feng16, Zhiyong Feng, Zhihong Fu, Shiming Ge17, Rama Krishna Sai Subrahmanyam Gorthi14, Yuzhang Gu17, Bilge Gunsel, Qing Guo18, Filiz Gurkan, Wencheng Han, Yanyan Huang, Felix Järemo Lawin7, Shang-Jhih Jhang, Rongrong Ji, Cheng Jiang13, Yingjie Jiang19, Felix Juefei-Xu, Yin Jun, Xiao Ke20, Fahad Shahbaz Khan21, Byeong Hak Kim, Josef Kittler16, Xiangyuan Lan22, Jun Ha Lee23, Bastian Leibe, Hui Li19, Jianhua Li11, Xianxian Li, Yuezhou Li20, Bo Liu, Chang Liu11, Jingen Liu, Li Liu24, Qingjie Liu, Huchuan Lu11, Wei Lu, Jonathon Luiten25, Jie Ma, Ziang Ma, Niki Martinel15, Christoph Mayer4, Alireza Memarmoghadam26, Christian Micheloni15, Yuzhen Niu, Danda Pani Paudel4, Houwen Peng27, Shoumeng Qiu17, Aravindh Rajiv, Muhammad Rana, Andreas Robinson7, Hasan Saribas28, Ling Shao, Mohamed Shehata29, Furao Shen, Jianbing Shen, Kristian Simonato, Xiaoning Song, Zhangyong Tang19, Radu Timofte30, Philip H. S. Torr31, Chi-Yi Tsai32, Bedirhan Uzun10, Luc Van Gool33, Paul Voigtlaender25, Dong Wang11, Guangting Wang34, Liangliang Wang35, Lijun Wang11, Limin Wang13, Linyuan Wang, Yong Wang36, Yunhong Wang, Chenyan Wu, Gangshan Wu13, Xiaojun Wu35, Fei Xie37, Tianyang Xu16, Xiang Xu, Wanli Xue38, Bin Yan11, Wankou Yang, Xiaoyun Yang11, Yu Ye20, Jun Yin, Chengwei Zhang, Chunhui Zhang17, Haitao Zhang, Kaihua Zhang39, Kangkai Zhang17, Xiaohan Zhang, Xiaolin Zhang40, Xinyu Zhang11, Zhibin Zhang, Shaochuan Zhao19, Ming Zhen, Bineng Zhong, Jiawen Zhu, Xue-Feng Zhu19 
01 Oct 2021

60 citations

Proceedings Article
04 Oct 2018
TL;DR: This work exploits the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation, and introduces a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution.
Abstract: The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, the generation becomes increasingly challenging with the increase of the resolution/duration of videos. In this work, we exploit the idea of progressive growing of Generative Adversarial Networks (GANs) for higher resolution video generation. In particular, we begin to produce video samples of low-resolution and short-duration, and then progressively increase both resolution and duration alone (or jointly) by adding new spatiotemporal convolutional layers to the current networks. Starting from the learning on a very raw-level spatial appearance and temporal movement of the video distribution, the proposed progressive method learns spatiotemporal information incrementally to generate higher resolution videos. Furthermore, we introduce a sliced version of Wasserstein GAN (SWGAN) loss to improve the distribution learning on the video data of high-dimension and mixed-spatiotemporal distribution. SWGAN loss replaces the distance between joint distributions by that of one-dimensional marginal distributions, making the loss easier to compute. We evaluate the proposed model on our collected face video dataset of 10,900 videos to generate photorealistic face videos of 256x256x32 resolution. In addition, our model also reaches a record inception score of 14.57 in unsupervised action recognition dataset UCF-101.

56 citations

Proceedings Article
01 Jan 2021
TL;DR: In this article, a learned association network is introduced to propagate the identities of all target candidates from frame-to-frame, allowing to track distractor objects from frame to frame.
Abstract: The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach. We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.2% on LaSOT and a +6.1% absolute gain on the OxUvA long-term dataset.

47 citations


Cited by
More filters
01 Jan 2016
TL;DR: The the senses considered as perceptual systems is universally compatible with any devices to read, and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you for downloading the senses considered as perceptual systems. Maybe you have knowledge that, people have search hundreds times for their favorite novels like this the senses considered as perceptual systems, but end up in infectious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some malicious bugs inside their desktop computer. the senses considered as perceptual systems is available in our book collection an online access to it is set as public so you can get it instantly. Our books collection hosts in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the the senses considered as perceptual systems is universally compatible with any devices to read.

854 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on deep facial expression recognition (FER) can be found in this article, including datasets and algorithms that provide insights into the intrinsic problems of deep FER, including overfitting caused by lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias.
Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

712 citations

Journal Article
TL;DR: A new approach to visual navigation under changing conditions dubbed SeqSLAM, which removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images.
Abstract: Learning and then recognizing a route, whether travelled during the day or at night, in clear or inclement weather, and in summer or winter is a challenging task for state of the art algorithms in computer vision and robotics. In this paper, we present a new approach to visual navigation under changing conditions dubbed SeqSLAM. Instead of calculating the single location most likely given a current image, our approach calculates the best candidate matching location within every local navigation sequence. Localization is then achieved by recognizing coherent sequences of these “local best matches”. This approach removes the need for global matching performance by the vision front-end - instead it must only pick the best match within any short sequence of images. The approach is applicable over environment changes that render traditional feature-based techniques ineffective. Using two car-mounted camera datasets we demonstrate the effectiveness of the algorithm and compare it to one of the most successful feature-based SLAM algorithms, FAB-MAP. The perceptual change in the datasets is extreme; repeated traverses through environments during the day and then in the middle of the night, at times separated by months or years and in opposite seasons, and in clear weather and extremely heavy rain. While the feature-based method fails, the sequence-based algorithm is able to match trajectory segments at 100% precision with recall rates of up to 60%.

686 citations

Journal ArticleDOI
01 Jun 2006
TL;DR: An apposite and eminently readable reference for all behavioral science research and development.
Abstract: An apposite and eminently readable reference for all behavioral science research and development

649 citations