Home
/
Authors
/
Xavier Baró

Author

Xavier Baró

Other affiliations: Autonomous University of Barcelona, University of Barcelona, Radboud University Nijmegen

Bio: Xavier Baró is an academic researcher from Open University of Catalonia. The author has contributed to research in topics: Gesture recognition & Gesture. The author has an hindex of 27, co-authored 105 publications receiving 2474 citations. Previous affiliations of Xavier Baró include Autonomous University of Barcelona & University of Barcelona.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Chalearn looking at people challenge 2014: Dataset and results

[...]

Sergio Escalera¹, Xavier Baró², Jordi Gonzàlez, Miguel Ángel Bautista¹, Meysam Madadi¹, Miguel Reyes¹, Víctor Ponce-López², Víctor Ponce-López¹, Hugo Jair Escalante, Jamie Shotton³, Isabelle Guyon - Show less +7 more•Institutions (3)

University of Barcelona¹, Open University of Catalonia², Microsoft³

06 Sep 2014

TL;DR: In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition.

...read moreread less

Abstract: This paper summarizes the ChaLearn Looking at People 2014 challenge data and the results obtained by the participants. The competition was split into three independent tracks: human pose recovery from RGB data, action and interaction recognition from RGB data sequences, and multi-modal gesture recognition from RGB-Depth sequences. For all the tracks, the goal was to perform user-independent recognition in sequences of continuous images using the overlapping Jaccard index as the evaluation measure. In this edition of the ChaLearn challenge, two large novel data sets were made publicly available and the Microsoft Codalab platform were used to manage the competition. Outstanding results were achieved in the three challenge tracks, with accuracy results of 0.20, 0.50, and 0.85 for pose recovery, action/interaction recognition, and multi-modal gesture recognition, respectively.

...read moreread less

221 citations

Journal Article•DOI•

Traffic Sign Recognition Using Evolutionary Adaboost Detection and Forest-ECOC Classification

[...]

Xavier Baró, Sergio Escalera, Jordi Vitrià, Oriol Pujol, Petia Radeva - Show less +1 more

01 Mar 2009-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A novel approach for the detection and classification of traffic signs that offers high performance and better accuracy than the state-of-the-art strategies and is potentially better in terms of noise, affine deformation, partial occlusions, and reduced illumination.

...read moreread less

Abstract: The high variability of sign appearance in uncontrolled environments has made the detection and classification of road signs a challenging problem in computer vision. In this paper, we introduce a novel approach for the detection and classification of traffic signs. Detection is based on a boosted detectors cascade, trained with a novel evolutionary version of Adaboost, which allows the use of large feature spaces. Classification is defined as a multiclass categorization problem. A battery of classifiers is trained to split classes in an Error-Correcting Output Code (ECOC) framework. We propose an ECOC design through a forest of optimal tree structures that are embedded in the ECOC matrix. The novel system offers high performance and better accuracy than the state-of-the-art strategies and is potentially better in terms of noise, affine deformation, partial occlusions, and reduced illumination.

...read moreread less

195 citations

Proceedings Article•DOI•

Multi-modal gesture recognition challenge 2013: dataset and results

[...]

Sergio Escalera¹, Jordi Gonzàlez², Xavier Baró³, Miguel Reyes¹, Oscar Lopes, Isabelle Guyon, Vassilis Athitsos⁴, Hugo Jair Escalante - Show less +4 more•Institutions (4)

University of Barcelona¹, Autonomous University of Barcelona², Open University of Catalonia³, University of Texas at Austin⁴

09 Dec 2013

TL;DR: A challenge on multi-modal gesture recognition with 54 international teams, providing the audio, skeletal model, user mask, RGB and depth images, and outstanding results were obtained by the first ranked participants.

...read moreread less

Abstract: The recognition of continuous natural gestures is a complex and challenging problem due to the multi-modal nature of involved visual cues (e.g. fingers and lips movements, subtle facial expressions, body pose, etc.), as well as technical limitations such as spatial and temporal resolution and unreliable depth cues. In order to promote the research advance on this field, we organized a challenge on multi-modal gesture recognition. We made available a large video database of 13,858 gestures from a lexicon of 20 Italian gesture categories recorded with a Kinect™ camera, providing the audio, skeletal model, user mask, RGB and depth images. The focus of the challenge was on user independent multiple gesture learning. There are no resting positions and the gestures are performed in continuous sequences lasting 1-2 minutes, containing between 8 and 20 gesture instances in each sequence. As a result, the dataset contains around 1.720.800 frames. In addition to the 20 main gesture categories, "distracter" gestures are included, meaning that additional audio and gestures out of the vocabulary are included. The final evaluation of the challenge was defined in terms of the Levenshtein edit distance, where the goal was to indicate the real order of gestures within the sequence. 54 international teams participated in the challenge, and outstanding results were obtained by the first ranked participants.

...read moreread less

188 citations

Book Chapter•DOI•

ChaLearn LAP 2016: First Round Challenge on First Impressions - Dataset and Results

[...]

Víctor Ponce-López¹, Víctor Ponce-López², Baiyu Chen³, Marc Oliu¹, Ciprian A. Corneanu², Albert Clapés², Isabelle Guyon⁴, Xavier Baró¹, Hugo Jair Escalante, Sergio Escalera² - Show less +6 more•Institutions (4)

Open University of Catalonia¹, University of Barcelona², University of California, Berkeley³, University of Paris⁴

08 Oct 2016

TL;DR: This paper summarizes the ChaLearn Looking at People 2016 First Impressions challenge data and results obtained by the teams in the first round of the competition, to automatically evaluate five “apparent” personality traits from videos of subjects speaking in front of a camera, by using human judgment.

...read moreread less

Abstract: This paper summarizes the ChaLearn Looking at People 2016 First Impressions challenge data and results obtained by the teams in the first round of the competition. The goal of the competition was to automatically evaluate five “apparent” personality traits (the so-called “Big Five”) from videos of subjects speaking in front of a camera, by using human judgment. In this edition of the ChaLearn challenge, a novel data set consisting of 10,000 shorts clips from YouTube videos has been made publicly available. The ground truth for personality traits was obtained from workers of Amazon Mechanical Turk (AMT). To alleviate calibration problems between workers, we used pairwise comparisons between videos, and variable levels were reconstructed by fitting a Bradley-Terry-Luce model with maximum likelihood. The CodaLab open source platform was used for submission of predictions and scoring. The competition attracted, over a period of 2 months, 84 participants who are grouped in several teams. Nine teams entered the final phase. Despite the difficulty of the task, the teams made great advances in this round of the challenge.

...read moreread less

174 citations

Proceedings Article•DOI•

A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences

[...]

Maryam Asadi-Aghbolaghi, Albert Clapés¹, Marco Bellantonio², Hugo Jair Escalante³, Víctor Ponce-López¹, Xavier Baró⁴, Isabelle Guyon, Shohreh Kasaei, Sergio Escalera¹ - Show less +5 more•Institutions (4)

University of Barcelona¹, Polytechnic University of Catalonia², National Institute of Astrophysics, Optics and Electronics³, Open University of Catalonia⁴

01 May 2017

TL;DR: A taxonomy that summarizes important aspects of deep learning for approaching both action and gesture recognition in image sequences is introduced, and the main works proposed so far are summarized.

...read moreread less

Abstract: The interest in action and gesture recognition has grown considerably in the last years. In this paper, we present a survey on current deep learning methodologies for action and gesture recognition in image sequences. We introduce a taxonomy that summarizes important aspects of deep learning for approaching both tasks. We review the details of the proposed architectures, fusion strategies, main datasets, and competitions. We summarize and discuss the main works proposed so far with particular interest on how they treat the temporal dimension of data, discussing their main features and identify opportunities and challenges for future research.

...read moreread less

171 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

Collapse

Cited by

PDF

Open Access

More filters

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

IEEE transactions on pattern analysis and machine intelligence

[...]

Ieee Xplore

01 Jan 1979

TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.

...read moreread less

Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

...read moreread less

1,758 citations

Journal Article•DOI•

Data-Driven Intelligent Transportation Systems: A Survey

[...]

Junping Zhang¹, Fei-Yue Wang, Kunfeng Wang, Wei-Hua Lin², Xin Xu³, Cheng Chen - Show less +2 more•Institutions (3)

Fudan University¹, University of Arizona², National University of Defense Technology³

01 Dec 2011-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A survey on the development of D2ITS is provided, discussing the functionality of its key components and some deployment issues associated with D2 ITS Future research directions for the developed system are presented.

...read moreread less

Abstract: For the last two decades, intelligent transportation systems (ITS) have emerged as an efficient way of improving the performance of transportation systems, enhancing travel security, and providing more choices to travelers. A significant change in ITS in recent years is that much more data are collected from a variety of sources and can be processed into various forms for different stakeholders. The availability of a large amount of data can potentially lead to a revolution in ITS development, changing an ITS from a conventional technology-driven system into a more powerful multifunctional data-driven intelligent transportation system (D2ITS) : a system that is vision, multisource, and learning algorithm driven to optimize its performance. Furthermore, D2ITS is trending to become a privacy-aware people-centric more intelligent system. In this paper, we provide a survey on the development of D2ITS, discussing the functionality of its key components and some deployment issues associated with D2ITS Future research directions for the development of D2ITS is also presented.

...read moreread less

1,336 citations

Journal Article•DOI•

Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks

[...]

Rasmus Rothe¹, Radu Timofte¹, Luc Van Gool²•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Apr 2018-International Journal of Computer Vision

TL;DR: A deep learning solution to age estimation from a single face image without the use of facial landmarks is proposed and the IMDB-WIKI dataset is introduced, the largest public dataset of face images with age and gender labels.

...read moreread less

Abstract: In this paper we propose a deep learning solution to age estimation from a single face image without the use of facial landmarks and introduce the IMDB-WIKI dataset, the largest public dataset of face images with age and gender labels. If the real age estimation research spans over decades, the study of apparent age estimation or the age as perceived by other humans from a face image is a recent endeavor. We tackle both tasks with our convolutional neural networks (CNNs) of VGG-16 architecture which are pre-trained on ImageNet for image classification. We pose the age estimation problem as a deep classification problem followed by a softmax expected value refinement. The key factors of our solution are: deep learned models from large data, robust face alignment, and expected value formulation for age regression. We validate our methods on standard benchmarks and achieve state-of-the-art results for both real and apparent age estimation.

...read moreread less

755 citations

Journal Article•DOI•

Deep Facial Expression Recognition: A Survey

[...]

Shan Li¹, Weihong Deng¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

23 Apr 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A comprehensive survey on deep facial expression recognition (FER) can be found in this article, including datasets and algorithms that provide insights into the intrinsic problems of deep FER, including overfitting caused by lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias.

...read moreread less

Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

...read moreread less

712 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse