Home
/
Authors
/
Estibaliz Garrote

Author

Estibaliz Garrote

Bio: Estibaliz Garrote is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Hyperspectral imaging & Medicine. The author has an hindex of 7, co-authored 19 publications receiving 3040 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

HMDB: A large video database for human motion recognition

[...]

Hilde Kuehne, Hueihan Jhuang¹, Estibaliz Garrote¹, Tomaso Poggio¹, Thomas Serre² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Brown University²

06 Nov 2011

TL;DR: This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.

...read moreread less

Abstract: With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.

...read moreread less

3,533 citations

Journal Article•DOI•

Automated home-cage behavioural phenotyping of mice

[...]

Hueihan Jhuang¹, Estibaliz Garrote¹, Xinlin Yu², Vinita Khilnani², Tomaso Poggio¹, Andrew D. Steele², Thomas Serre¹, Thomas Serre³ - Show less +4 more•Institutions (3)

Massachusetts Institute of Technology¹, California Institute of Technology², Brown University³

07 Sep 2010-Nature Communications

TL;DR: A trainable computer vision system enabling the automated analysis of complex mouse behaviours that performs on par with human scoring, as measured from ground-truth manual annotations of thousands of clips of freely behaving mice.

...read moreread less

Abstract: Neurobehavioural analysis of mouse phenotypes requires the monitoring of mouse behaviour over long periods of time. In this study, we describe a trainable computer vision system enabling the automated analysis of complex mouse behaviours. We provide software and an extensive manually annotated video database used for training and testing the system. Our system performs on par with human scoring, as measured from ground-truth manual annotations of thousands of clips of freely behaving mice. As a validation of the system, we characterized the home-cage behaviours of two standard inbred and two non-standard mouse strains. From these data, we were able to predict in a blind test the strain identity of individual animals with high accuracy. Our video-based software will complement existing sensor-based automated approaches and enable an adaptable, comprehensive, high-throughput, fine-grained, automated analysis of mouse behaviour.

...read moreread less

253 citations

Posted Content•

Data-Driven Color Augmentation Techniques for Deep Skin Image Analysis.

[...]

Adrian Galdran, Aitor Alvarez-Gila, Maria Ines Meyer, Cristina L. Saratxaga, Teresa Araújo, Estibaliz Garrote, Guilherme Aresta, Pedro Costa, Ana Maria Mendonça, Aurélio Campilho - Show less +6 more

10 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work applies the emph{shades of gray} color constancy technique to color-normalize the entire training set of images, while retaining the estimated illuminants, for training two deep convolutional neural networks for the tasks of skin lesion segmentation and skin lesions classification.

...read moreread less

Abstract: Dermoscopic skin images are often obtained with different imaging devices, under varying acquisition conditions. In this work, instead of attempting to perform intensity and color normalization, we propose to leverage computational color constancy techniques to build an artificial data augmentation technique suitable for this kind of images. Specifically, we apply the \emph{shades of gray} color constancy technique to color-normalize the entire training set of images, while retaining the estimated illuminants. We then draw one sample from the distribution of training set illuminants and apply it on the normalized image. We employ this technique for training two deep convolutional neural networks for the tasks of skin lesion segmentation and skin lesion classification, in the context of the ISIC 2017 challenge and without using any external dermatologic image set. Our results on the validation set are promising, and will be supplemented with extended results on the hidden test set when available.

...read moreread less

60 citations

Journal Article•DOI•

Mixed convolutional and long short-term memory network for the detection of lethal ventricular arrhythmia

[...]

Artzai Picon, Unai Irusta¹, Aitor Alvarez-Gila, Elisabete Aramendi¹, Felipe Alonso-Atienza², Felipe Alonso-Atienza³, Carlos Figuera³, Carlos Figuera², Unai Ayala, Estibaliz Garrote, Lars Wik⁴, Jo Kramer-Johansen⁴, Trygve Eftestøl⁵ - Show less +9 more•Institutions (5)

University of the Basque Country¹, King Juan Carlos University², BBVA Compass³, Oslo University Hospital⁴, University of Stavanger⁵

20 May 2019-PLOS ONE

TL;DR: A deep learning architecture based on 1D-CNN layers and a Long Short-Term Memory (LSTM) network for the detection of VF is introduced, believed to be the most accurate VF detection algorithm to date, especially on OHCA data, and it would enable an accurate shock no shock diagnosis in a very short time.

...read moreread less

Abstract: Early defibrillation by an automated external defibrillator (AED) is key for the survival of out-of-hospital cardiac arrest (OHCA) patients. ECG feature extraction and machine learning have been successfully used to detect ventricular fibrillation (VF) in AED shock decision algorithms. Recently, deep learning architectures based on 1D Convolutional Neural Networks (CNN) have been proposed for this task. This study introduces a deep learning architecture based on 1D-CNN layers and a Long Short-Term Memory (LSTM) network for the detection of VF. Two datasets were used, one from public repositories of Holter recordings captured at the onset of the arrhythmia, and a second from OHCA patients obtained minutes after the onset of the arrest. Data was partitioned patient-wise into training (80%) to design the classifiers, and test (20%) to report the results. The proposed architecture was compared to 1D-CNN only deep learners, and to a classical approach based on VF-detection features and a support vector machine (SVM) classifier. The algorithms were evaluated in terms of balanced accuracy (BAC), the unweighted mean of the sensitivity (Se) and specificity (Sp). The BAC, Se, and Sp of the architecture for 4-s ECG segments was 99.3%, 99.7%, and 98.9% for the public data, and 98.0%, 99.2%, and 96.7% for OHCA data. The proposed architecture outperformed all other classifiers by at least 0.3-points in BAC in the public data, and by 2.2-points in the OHCA data. The architecture met the 95% Sp and 90% Se requirements of the American Heart Association in both datasets for segment lengths as short as 3-s. This is, to the best of our knowledge, the most accurate VF detection algorithm to date, especially on OHCA data, and it would enable an accurate shock no shock diagnosis in a very short time.

...read moreread less

54 citations

Proceedings Article•DOI•

Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB

[...]

Aitor Alvarez-Gila, Joost van de Weijer, Estibaliz Garrote

01 Sep 2017

TL;DR: In this article, a conditional generative adversarial framework is used to capture spatial semantics for hyperspectral image reconstruction, achieving a Root Mean Squared Error (RMSE) drop of 44.7% and a Relative RMSE drop of 47.0%.

...read moreread less

Abstract: Hyperspectral signal reconstruction aims at recovering the original spectral input that produced a certain trichromatic (RGB) response from a capturing device or observer. Given the heavily underconstrained, non-linear nature of the problem, traditional techniques leverage different statistical properties of the spectral signal in order to build informative priors from real world object reflectances for constructing such RGB to spectral signal mapping. However, most of them treat each sample independently, and thus do not benefit from the contextual information that the spatial dimensions can provide. We pose hyperspectral natural image reconstruction as an image to image mapping learning problem, and apply a conditional generative adversarial framework to help capture spatial semantics. This is the first time Convolutional Neural Networks -and, particularly, Generative Adversarial Networks- are used to solve this task. Quantitative evaluation shows a Root Mean Squared Error (RMSE) drop of 44.7% and a Relative RMSE drop of 47.0% on the ICVL natural hyperspectral image dataset.

...read moreread less

53 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Proceedings Article•

Two-Stream Convolutional Networks for Action Recognition in Videos

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

08 Dec 2014

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

...read moreread less

Abstract: We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multitask learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

...read moreread less

6,397 citations

Proceedings Article•DOI•

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

[...]

Joao Carreira, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

21 Jul 2017

TL;DR: In this article, a Two-Stream Inflated 3D ConvNet (I3D) is proposed to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and their parameters.

...read moreread less

Abstract: The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human action classes and over 400 clips per class, and is collected from realistic, challenging YouTube videos. We provide an analysis on how current architectures fare on the task of action classification on this dataset and how much performance improves on the smaller benchmark datasets after pre-training on Kinetics. We also introduce a new Two-Stream Inflated 3D ConvNet (I3D) that is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification ConvNets are expanded into 3D, making it possible to learn seamless spatio-temporal feature extractors from video while leveraging successful ImageNet architecture designs and even their parameters. We show that, after pre-training on Kinetics, I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101.

...read moreread less

5,073 citations

Posted Content•

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

[...]

Khurram Soomro, Amir Roshan Zamir, Mubarak Shah

03 Dec 2012-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces UCF101 which is currently the largest dataset of human actions and provides baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%.

...read moreread less

Abstract: We introduce UCF101 which is currently the largest dataset of human actions. It consists of 101 action classes, over 13k clips and 27 hours of video data. The database consists of realistic user uploaded videos containing camera motion and cluttered background. Additionally, we provide baseline action recognition results on this new dataset using standard bag of words approach with overall performance of 44.5%. To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

...read moreread less

4,784 citations

Posted Content•

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

[...]

Jeff Donahue¹, Lisa Anne Hendricks¹, Marcus Rohrbach¹, Subhashini Venugopalan², Sergio Guadarrama¹, Kate Saenko³, Trevor Darrell¹ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

17 Nov 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep"' in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

3,935 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse