scispace - formally typeset
Open AccessProceedings ArticleDOI

Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions

TLDR
This work introduces a strategy to dynamically select face regions useful for robust HR estimation, inspired by recent advances on matrix completion theory, which significantly outperforms state-of-the-art HR estimation methods in naturalistic conditions.
Abstract
Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR). While considerable progress has been made in the last few years, still many issues remain open. In particular, state of-the-art approaches are not robust enough to operate in natural conditions (e.g. in case of spontaneous movements, facial expressions, or illumination changes). Opposite to previous approaches that estimate the HR by processing all the skin pixels inside a fixed region of interest, we introduce a strategy to dynamically select face regions useful for robust HR estimation. Our approach, inspired by recent advances on matrix completion theory, allows us to predict the HR while simultaneously discover the best regions of the face to be used for estimation. Thorough experimental evaluation conducted on public benchmarks suggests that the proposed approach significantly outperforms state-of the-art HR estimation methods in naturalistic conditions.

read more

Content maybe subject to copyright    Report

Self-Adaptive Matrix Completion for Heart Rate Estimation
from Face Videos under Realistic Conditions
Sergey Tulyakov
1
, Xavier Alameda-Pineda
1
, Elisa Ricci
2,3
, Lijun Yin
4
, Jeffrey F. Cohn
5,6
, Nicu Sebe
1
1
University of Trento, Via Sommarive 9, 38123 Trento, Italy
2
Fondazione Bruno Kessler, Via Sommarive 18, 38123 Trento, Italy
3
University of Perugia, Via Duranti 93, 06123, Perugia, Italy
4
State University of New York at Binghamton, Binghamton, NY 13902, USA
5
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
6
Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA
{sergey.tulyakov,xavier.alamedapineda,niculae.sebe}@unitn.it,
eliricci@fbk.eu, lijun@cs.binghamton.edu, jeffcohn@pitt.edu
Abstract
Recent studies in computer vision have shown that,
while practically invisible to a human observer, skin color
changes due to blood flow can be captured on face videos
and, surprisingly, be used to estimate the heart rate (HR).
While considerable progress has been made in the last few
years, still many issues remain open. In particular, state-
of-the-art approaches are not robust enough to operate in
natural conditions (e.g. in case of spontaneous movements,
facial expressions, or illumination changes). Opposite to
previous approaches that estimate the HR by processing all
the skin pixels inside a fixed region of interest, we intro-
duce a strategy to dynamically select face regions useful for
robust HR estimation. Our approach, inspired by recent ad-
vances on matrix completion theory, allows us to predict
the HR while simultaneously discover the best regions of
the face to be used for estimation. Thorough experimental
evaluation conducted on public benchmarks suggests that
the proposed approach significantly outperforms state-of-
the-art HR estimation methods in naturalistic conditions.
1. Introduction
After being shown in [
23, 18] that changes invisible to
the naked eye can be used to estimate the heart rate from
a video of human skin, this topic has attracted a lot of at-
tention in the computer vision community. These subtle
changes encompass both color [
27] and motion [4] and they
are induced by the internal functioning of the heart. Since
faces appear frequently in videos and due to recent and sig-
Time
Figure 1. Motivation: Given a video sequence, automatic HR es-
timation from facial features is challenging due to target motion
and facial expressions. Facial features extracted over time in dif-
ferent parts of the face (purple rectangles) show different temporal
dynamics and are subject to noise, as they are heavily affected by
movements and illumination changes. In this paper, we propose a
novel approach to simultaneously estimate the HR signal and se-
lect the reliable face regions at each time for robust HR prediction.
nificant improvements in face tracking and alignment meth-
ods [
3, 21, 13, 14, 29], facial-based remote heart rate esti-
mation has recently become very popular [
17, 30, 10, 25].
Classical approaches successfully addressed this prob-
lem under laboratory-controlled conditions, i.e. imposing
constraints on the subject’s movements and requiring the
absence of facial expressions and mimics [
18, 27, 4]. There-
fore, such methods may not be suitable for real world appli-
cations, such as monitoring drivers inside a vehicle or peo-
ple exercising. Long-time analysis constitutes a further lim-
itation of existing works [
17, 18, 19]. Indeed, instead of es-
1
2396

timating the instantaneous heart rate, they provide the aver-
age HR measurement over a long video sequence. The main
disadvantage of using a long analysis window is the inabil-
ity to capture interesting short-time phenomena, such as a
sudden HR increase/decrease due to specific emotions [
22].
In practice, another problem faced by researchers de-
veloping automatic HR measurement approaches, is the
lack of publicly available datasets recorded under realis-
tic conditions. A notable exception is the MAHNOB-HCI
dataset [
20], a multimodal dataset for research on emotion
recognition and implicit tagging, which also contains HR
annotations. Importantly, an extensive evaluation of ex-
isting HR measurement methods on MAHNOB-HCI have
been performed by Li et al. [
17]. However, the MAHNOB-
HCI dataset suffers from some limitations, since the record-
ing conditions are quite controlled: most of the video se-
quences do not contain spontaneous facial expressions, illu-
mination changes or large target movements [
17].
In this work, we tackle the aforementioned problems
by introducing a novel approach for HR estimation from
face videos and providing an extensive evaluation on two
datasets: the MAHNOB-HCI, previously used for HR
recognition research [
17], and a spontaneous dataset with
heart rate data and RGB videos (named MMSE-HR), which
is a subset of the larger multimodal spontaneous emotion
corpus (MMSE) [31] specifically targeted to challenge HR
estimation methods.
Inspired by previous methods, we track the face in
a given video sequence, so to follow rigid head move-
ments [
17], and extract chrominance features [10] to com-
pensate for illumination variations. Importantly, most previ-
ous approaches preselect a face region of interest (ROI) that
is kept constant through the entire HR estimation. How-
ever, the region containing useful features for HR estima-
tion is a priori different for every frame since major appear-
ance changes are spatially and temporally localized (Fig.
1).
Therefore, we propose a principled data-driven approach to
automatically detect the face parts useful for HR measure-
ment, that is to estimate the time-varying mask of useful ob-
servations, selecting at each frame the relevant face regions
from the chrominance features themselves.
Recent advances on matrix completion (MC) theory [
11]
have shown the ability to recover missing entries of a ma-
trix that is partially observed, i.e. masked. Up to the authors
knowledge, we propose the first matrix completion-based
learning algorithm able to self-adapt, that is to automati-
cally select the useful observations, and call it self-adaptive
matrix completion (SAMC). Intuitively, while learning the
mask allows us to discard those face regions strongly af-
fected by facial expressions or large movements, complet-
ing the matrix smooths out the smaller noise associated to
the chrominance feature extraction procedure. The experi-
ments we conducted on the MANHOB-HCI dataset clearly
show that our method outperforms the state-of-the-art ap-
proaches for HR prediction. To further demonstrate the
ability of our method to operate in challenging scenar-
ios, we report a series of tests on the MMSE-HR dataset,
where subjects show significant movements and facial ex-
pressions.
Thus, the contribution of this paper is three-fold:
We present a novel approach to address the problem of
HR estimation from face videos in realistic conditions.
To cope with large facial variations due to spontaneous
facial expressions and movements, we propose a prin-
cipled framework to automatically discard the face re-
gions corresponding to noisy features and only use the
reliable ones for HR prediction. The region selection
is addressed within a novel matrix completion-based
optimization framework, called self-adaptive matrix
completion, for which an efficient solver is proposed.
Our approach is demonstrated to be more accurate than
previous methods for average HR estimation on pub-
licly available benchmarks. In addition, we report
short-term analysis results to show the ability of our
method to detect instantaneous heart rate.
We perform extensive evaluation on the commonly
used MAHNOB-HCI dataset and a spontaneous
MMSE-HR dataset including 102 sequences of 40 sub-
jects, moving and performing spontaneous facial ex-
pressions. As we show, this dataset is valuable for in-
stantaneous HR estimation.
2. Related Work
In this section, we briefly review previous works on re-
mote heart rate measurement and on matrix completion.
2.1. HR Estimation from Face Videos
Cardiac activity measurement is an essential tool to con-
trol the subjects’ health and is actively used by medical
practitioners. Conventional contact methods offer high ac-
curacy of cardiac cycle. However, they require specific sen-
sors to be attached to the human skin, be it a set of elec-
trocardiogram (ECG) leads, a pulse oximiter, or the more
recent fitness tracker. To avoid the use of invasive sensors,
non-contact remote HR measurement from visual data has
been proposed recently by computer vision researchers.
Verkruysse et al. [
23] showed that ambient light and a
consumer camera can be used to reveal the cardio-vascular
pulse wave and to remotely analyze the vital signs of a per-
son. Poh et al. [
18] proposed to use blind source separation
on color changes caused by heart activity to extract the HR
signal from a face video. In [
27] an Eulerian magnification
method is used to amplify subtle changes in a video stream
2397

and to visualize temporal dynamics of the blood flow. Bal-
akrishnan et al. [
4] showed that subtle head motions are af-
fected by cardiac activity, and these motions can be used to
extract HR measurements from a video stream.
However, all these methods failed to address the prob-
lems of HR estimation in presence of facial expressions
and subject’s movements, despite their frequent presence
in real-world applications. This limits the use of these ap-
proaches to laboratory settings. In [
10, 25] a chrominance-
based method to relax motion constraints was introduced.
However, this approach was tested on a few not-publicly-
available sequences, making it hard to compare with.
Li et al. [
17] proposed an approach based on adap-
tive filtering to handle illumination and motion issues and
they evaluated it on the publicly available MAHNOB-HCI
dataset [
20]. However, although this work represents a
valuable step towards remote HR measurement from visual
data, it also shares several major limitations with the pre-
vious methods. The output of the method is the average
HR, whereas to capture short-term phenomena (e.g. HR
variations due to instantaneous emotions) the processing
of smaller time intervals is required. A further limitation
of [
17] is the MAHNOB-HCI dataset itself, since it is col-
lected in a laboratory setting and the subjects are required
to wear an invasive EEG measuring device on their head.
Additionally, subjects perform neither large movements nor
many spontaneous facial expressions.
In this work, we address the aforementioned limitations
by proposing a novel method capable of predicting HR with
higher accuracy than the state-of-the-art approaches and of
robustly operating on short time sequences in order to detect
the instantaneous HR. To our knowledge, while previous
works [
17, 25] have acknowledged the importance of select-
ing parts of the signal to cope with noise and provide robust
HR estimates, this paper is the first to tackle this problem
within a principled optimization framework.
2.2. Matrix completion
Matrix completion [
11] approaches develop from the
idea that an unknown low-rank matrix can be recovered
from a small set of entries. This is done by solving an op-
timization problem, namely, a rank minimization problem
subject to some data constraints arising from the small set of
entries. Matrix completion has proved successful for many
computer vision tasks, when data and labels are noisy or in
the case of missing data, such as multi-label image classi-
fication [
6], image retrieval and tagging [28, 9], manifold
correspondence finding [16], head/body pose estimation [1]
and emotion recognition from abstract paintings [
2]. Most
of these works extended the original MC framework by im-
posing task-specific constraints. For instance, in [9] a MC
problem is formulated adding a specific regularizer to ad-
dress the ambiguous labeling problem. Very importantly,
even if most computer-vision papers based on matrix com-
pletion are addressing classification tasks, therefore split-
ting the matrix to be completed between features and labels,
MC techniques can be used in general, without any struc-
tural splitting. Indeed, in [
15] matrix completion is adopted
to address the movie recommendation problem, where each
column (row) represents a user (movie), and therefore each
entry of the matrix shows the suitableness of a video for a
user. In [
16, 15], the MC problem is extended to take into
account an underlying graph structure inducing a weighted
relationship between the columns/rows of the matrix. In this
paper, we were inspired by [
16, 15, 1] in modeling the tem-
poral smoothness of the HR signal. However, our method
is essentially novel, since we are able to simultaneously re-
cover the unknown low-rank matrix and the underlying data
mask, corresponding to the most reliable observations.
3. HR Estimation using SAMC
In this section we describe the proposed approach for
HR estimation from face videos, that has four main phases
as shown in Figure
2. Phase 1 is devoted to process face
images so to extract face regions, that are used in phase 2
to compute chrominance features. Phase 3 consists in the
joint estimation of the underlying low-rank feature matrix
and the mask using SAMC. Finally, phase 4 computes the
heart rate from the signal estimate provided by SAMC.
3.1. Phases 1 & 2: From Face Videos to Chromi-
nance Features
Inspired by previous methods on remote HR estimation,
we use Intraface
1
to localize and track 66 facial landmarks.
Many approaches have been employed for face frontalisa-
tion [
24, 12]. However, in order to preserve the underlying
blood flow signal, in the current study we define the facial
region of interest (see Fig.
2-Phase 1), from which the HR
will be estimated. The potential ROI is then warped to a
rectangle using a piece-wise linear warping procedure, be-
fore dividing the potential ROI into a grid containing R re-
gions.
The overall performance of the HR estimation method
will strongly depend on the features extracted on each of
the R sub-regions of the facial ROI. Ideally, we would se-
lect features that are robust to facial movements and expres-
sions, while being discriminant enough to account for the
subtle changes in skin color. Currently, the best features
for HR estimation are the chrominance features, defined
in [
10]. The chrominance features for HR estimation are
derived from the RGB channels, as follows. For each pixel
the chrominance signal C is computed as the linear com-
bination of two signals X
f
and Y
f
, i.e. C = X
f
αY
f
,
where α =
σ(X
f
)
σ(Y
f
)
and σ(X
f
), σ(Y
f
) denote the standard
1
http://www.humansensing.cs.cmu.edu/intraface
2398

2. Feature Extraction
Feature
Extraction
Region 1
Region 2
Region R
...
...
ROI
extraction
ROI
Warping
1. Face Region Extraction
3. Self-Adaptive Matrix Completion
Observation matrix Low-rank matrix
Prior mask
SAMC
Estimated Mask
0 1 2 3 4 5 6
Frequency, Hz
HR Frequency
Signal estimated using SAMC
Magnitude
Power spectral
density estimation
4. Heart Rate Estimation
Figure 2. Overview of the proposed approach for HR estimation. During the first phase, we automatically detect a set of facial keypoints and
use them to define a ROI. This region is then warped to a rectangular area and divided into a grid. For each small sub-region, chrominance
features are computed (Phase 2). We then apply SAMC on the matrix of all feature observations to recover a smooth signal, while selecting
from which sub-regions the signal is recovered (Phase 3). Welch’s method [
26] is used to estimate the power spectral density and thus the
HR frequency (Phase 4).
deviations of X
f
, Y
f
. The signals X
f
, Y
f
are band-passed
filtered signals obtained respectively from the signals X and
Y , where X =3R
n
2G
n
, Y =1.5R
n
+ G
n
1.5B
n
and R
n
,G
n
and B
n
are the normalized values of the indi-
vidual color channels. The color combination coefficients
to derive X and Y are computed using a skin-tone stan-
dardization approach (see [
10] for details). For each region
r =1,...,R, the final chrominance features are computed
averaging the values of the chrominance signals over all the
pixels.
3.2. Phase 3: Self-Adaptive Matrix Completion
The estimation of HR from the chrominance features is
challenging for mainly two reasons. Firstly, the chromi-
nance features associated to different facial regions are not
fully synchronized. In other words, even if the output sig-
nals of many regions are synchronized between them (main-
stream underlying heart signal), the signal of many other re-
gions may not be in phase with the mainstream. Secondly,
face movements and facial expressions induce strong per-
turbations in the chrominance features. These perturbations
are typically local in space and time while large in intensity
(Fig.
1). Therefore, we need to localize where these pertur-
bations take place so not to use them in the HR estimation.
These two main difficulties are intuitively overcome by
deriving a matrix completion technique embedding a self-
adaptation strategy. On the one hand, since matrix com-
pletion problems are usually approached by reducing the
matrix rank, the low-rank estimated matrix naturally groups
the rows by their linear dependency. In our particular case,
two rows are (near) linearly dependent if and only if the
output signals they represent are synchronized. Therefore,
the underlying HR signal is hypothesized to be in the vector
subspace spanned by the largest group of linearly dependent
rows of the estimated low-rank matrix.
On the other hand, the estimated low-rank matrix is en-
forced to resemble the observations. In previous MC ap-
proaches [
6, 9, 1, 16], the non-observed part of the ma-
trix consisted of the labels of the test set. Thus, the set of
unknown matrix entries was fixed and known in advance.
The HR estimation problem is slightly different since there
are no missing observations, i.e. the matrix is fully ob-
served. However, many of these observations are highly
noisy, thus corrupting the estimation of the HR. Importantly,
we do not know in advance which are the corrupted obser-
vations. This is why we believe that this problem naturally
requires some form of adaptation, implying that the method
selects the samples with which the learning is performed.
Consequently, we name the proposed learning method self-
adaptive matrix completion (SAMC).
In order to formalize the self-adaptive matrix comple-
tion problem let us assume the existence of R regions
where chrominance features are computed during T video
frames. This provides a chrominance observations matrix
C 2 R
RT
. Ideally, in a scenario where we could trust all
region features continuously, we would simply estimate the
low-rank matrix that better approximates the matrix of ob-
servations C, by solving: min
E
ν rank(E)+kE Ck
2
F
,
where ν is a regularization parameter. Unfortunately, min-
imizing the rank is a NP-hard problem, and traditionally a
2399

convex surrogate of the rank, the nuclear norm, is used [8]:
min
E
νkEk
+ kE Ck
2
F
. (1)
Another intrinsic property of the chrominance features
is that, since the underlying reason of their oscillation is
the internal functioning of the heart, we should enforce the
estimated chrominance features (those of the low-rank esti-
mated matrix) to be within the heart-rate’s frequency range.
Inspired by [
15, 16, 1] we add a temporal smoothing term
by means of a Laplacian matrix L:
min
E
νkEk
+ kE Ck
2
F
+ γ Tr(ELE
>
), (2)
where γ measures the weight of the temporal smoothing
within the learning process. L should encode the relational
information between the observations acquired at different
instants, thus acting like a relaxed band-pass filter. Indeed,
imposing that e
r
is band-pass filtered is equivalent to reduce
ke
r
e
r
Tk
2
= ke
r
˜
Tk
2
, where each column of T is a
shifted replica of the band-pass normalized filter tap values
so that the product e
r
T boils down to a convolution and
˜
T
is a copy of T with zeros in the diagonal, since the band-
pass filter is normalized. Imposing this for all R regions at
once writes: Tr(E
˜
T
˜
T
>
E
>
), and therefore L =
˜
T
˜
T
>
.
As previously discussed, the estimated matrix should not
take into account the observed entries associated to large
movements or spontaneous facial expressions. We model
this by including a masking binary matrix M 2{0, 1}
RT
in the previous equation as [
6]:
min
E
νkEk
+ kM (E C)k
2
F
+ γ Tr(ELE
>
), (3)
where stands for the element-wise (Hadamard) product
and the entries of the matrix M are 1 if the corresponding
entry in C has to be taken into account for the HR estima-
tion and 0 otherwise.
Importantly, while in the previous studies M was known
in advance, in the present study we have to estimate it. We
naturally interpret this as a form of adaptation since M is a
observation-selection variable indicating from which obser-
vations should the method learn at each iteration. The mask-
ing matrix M should select the largest possible amount of
samples that provide useful information for the estimation
of the HR. Moreover, when available, it would be desirable
to use a prior for the mask M, taking real values between 0
and 1,
f
M 2 [0, 1]
RT
. The complete SAMC optimization
problem writes:
min
E,M
νkEk
+ kM (E C)k
2
F
+ γ Tr(ELE
>
)
βkMk
1
+ µkM
f
Mk
2
F
, (4)
The parameters β and µ regulate respectively the number
of selected observations and the importance of prior infor-
mation. In this paper the prior mask
f
M is defined as the
negative exponential of the local standard deviation of the
signal. Our intuition is that, if the signal has small local
standard deviation, the chrominance variation within the re-
gion is due to the heart-rate and not to head movements or
facial expressions, and therefore that matrix entry should be
used to estimate the HR.
3.2.1 Solving SAMC
The SAMC optimization problem in (
4) is not jointly con-
vex in E and M. Moreover, even in the case the mask-
ing matrix M was fixed, (
4) would contain non-differential
and differential terms and a direct optimization would be
challenging. Instead, alternating methods have proven to
be successful in solving (i) convex problems with non-
differential terms and (ii) marginally convex problems that
are not jointly convex. More precisely, we derive an opti-
misation solver based on the alternating direction method of
multipliers (ADMM) [
5]. In order to derive the associated
ADMM method, we first define the augmented Lagrangian
problem associated to (4):
min
E,F,M,Z
νkEk
+kM(F C)k
2
F
+γ Tr(FLF
>
)βkMk
1
+ µkM
f
Mk
2
F
+ hZ, E Fi +
ρ
2
kE Fk
2
F
, (5)
where F is defined to split the terms of (
4) that depend on
E into those that are differential and those that are not. The
variable Z represents the Lagrange multipliers constrain-
ing E to be equal to F, further regularized by the term
kEFk
2
F
. The ADMM solves the optimisation problem by
alternating the direction of the optimisation while keeping
the other directions fixed. Specifically, solving (
5) requires
alternating the following three steps until convergence:
E/M-step With fixed F and Z the optimal value of E is
obtained by solving:
min
E
νkEk
+
ρ
2
kE F + ρ
1
Zk
2
F
. (6)
The solution of such problem is given by the shrinkage op-
erator applied to F ρ
1
Z, see [
7]. Formally, if we write
the singular value decomposition of F ρ
1
Z = UDV
>
,
the optimal value for E is:
E
= US
ν
ρ
(D)V
>
, (7)
where S
λ
(x) = max(0,x λ) is the soft-thresholding op-
erator, applied element-wise to D in (
7).
The optimal value for M is obtained from the following
optimisation problem:
min
M
kM (F C)k
2
F
βkMk
1
+ µkM
f
Mk
2
F
, (8)
2400

Citations
More filters
Journal ArticleDOI

Algorithmic Principles of Remote PPG

TL;DR: A mathematical model is introduced that incorporates the pertinent optical and physiological properties of skin reflections with the objective to increase the understanding of the algorithmic principles behind remote photoplethysmography (rPPG).
Proceedings ArticleDOI

Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision

TL;DR: This paper argues the importance of auxiliary supervision to guide the learning toward discriminative and generalizable cues, and introduces a new face anti-spoofing database that covers a large range of illumination, subject, and pose variations.
Proceedings ArticleDOI

Multimodal Spontaneous Emotion Corpus for Human Behavior Analysis

TL;DR: A well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants, which includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection is presented.
Book ChapterDOI

DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks

TL;DR: In this paper, the authors proposed an end-to-end system for video-based measurement of heart and breathing rate using a deep convolutional network and an attention mechanism using appearance information to guide motion estimation.
Journal ArticleDOI

Unsupervised skin tissue segmentation for remote photoplethysmography

TL;DR: This paper proposes a simple approach to implicitly select skin tissues based on their distinct pulsatility feature and shows that this method outperforms state of the art algorithms, without any critical face or skin detection.
References
More filters
Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Journal ArticleDOI

The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms

TL;DR: In this article, the use of the fast Fourier transform in power spectrum analysis is described, and the method involves sectioning the record and averaging modified periodograms of the sections.
Journal ArticleDOI

A Singular Value Thresholding Algorithm for Matrix Completion

TL;DR: This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank, and develops a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms.
Journal ArticleDOI

Exact Matrix Completion via Convex Optimization

TL;DR: It is proved that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries, and that objects other than signals and images can be perfectly reconstructed from very limited information.
Journal ArticleDOI

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

TL;DR: It is shown that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What have the authors contributed in "Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions" ?

Opposite to previous approaches that estimate the HR by processing all the skin pixels inside a fixed region of interest, the authors introduce a strategy to dynamically select face regions useful for robust HR estimation. Their approach, inspired by recent advances on matrix completion theory, allows us to predict the HR while simultaneously discover the best regions of the face to be used for estimation. Thorough experimental evaluation conducted on public benchmarks suggests that the proposed approach significantly outperforms state-ofthe-art HR estimation methods in naturalistic conditions. 

Future work guidelines include devising novel feature representations, in alternative to chrominance signals, to further improve the robustness to varying illumination conditions as well as exploiting the feasibility of combining the predicted HR measurements with visual features for spontaneous emotion classification. 

Future work guidelines include devising novel feature representations, in alternative to chrominance signals, to further improve the robustness to varying illumination conditions as well as exploiting the feasibility of combining the predicted HR measurements with visual features for spontaneous emotion classification. 

minimizing the rank is a NP-hard problem, and traditionally aconvex surrogate of the rank, the nuclear norm, is used [8]:min EνkEk⇤ + kE−Ck 2 F . (1)Another intrinsic property of the chrominance features is that, since the underlying reason of their oscillation is the internal functioning of the heart, the authors should enforce the estimated chrominance features (those of the low-rank estimated matrix) to be within the heart-rate’s frequency range. 

It contains 27 subjects (12 males and 15 females) in total, and each subject participated in two experiments: (i) emotion elicitation and (ii) implicit tagging. 

The main disadvantage of using a long analysis window is the inability to capture interesting short-time phenomena, such as a sudden HR increase/decrease due to specific emotions [22]. 

The optimal value for M is obtained from the followingoptimisation problem:min MkM ◦ (F−C)k2F − βkMk1 + µkM− fMk2F , (8)which can be rewritten independently for each entry of M:min mrt2{0,1}(frt − ort) 2mrt + µ(mrt − emrt)2 − βmrt. 

On this difficult dataset, due to its capacity to select the most reliable chrominance features and ignore the noisy ones, the proposed SAMC achieves significantly higher accuracy than the state-of-the-art. 

On the one hand, since matrix completion problems are usually approached by reducing the matrix rank, the low-rank estimated matrix naturally groups the rows by their linear dependency. 

In this work, the authors address the aforementioned limitations by proposing a novel method capable of predicting HR with higher accuracy than the state-of-the-art approaches and of robustly operating on short time sequences in order to detect the instantaneous HR.