Quantifying Bias in a Face Verification System

doi:10.3390/cmsf2022003006

Open AccessProceedings ArticleDOI

Quantifying Bias in a Face Verification System

Megan Frisella, +4 more

Chats0

TLDR

The intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings is presented.

Abstract:

: Machine learning models perform face veriﬁcation (FV) for a variety of highly consequential applications, such as biometric authentication, face identiﬁcation, and surveillance. Many state-of-the-art FV systems suffer from unequal performance across demographic groups, which is commonly overlooked by evaluation measures that do not assess population-speciﬁc performance. Deployed systems with bias may result in serious harm against individuals or groups who experience underperformance. We explore several fairness deﬁnitions and metrics, attempting to quantify bias in Google’s FaceNet model. In addition to statistical fairness metrics, we analyze clustered face embeddings produced by the FV model. We link well-clustered embeddings (well-deﬁned, dense clusters) for a demographic group to biased model performance against that group. We present the intuition that FV systems underperform on protected demographic groups because they are less sensitive to differences between features within those groups, as evidenced by clustered embeddings. We show how this performance discrepancy results from a combination of representation and aggregation bias. death times for White face embeddings to later than other race groups ( p < 0.05 for W × A , W × I , and W × B t -tests), indicating that White embeddings are more in the embedding space. The other race groups have peak death times that are taller and earlier than the White race group. The shorter and wider peak for the White subgroup means that there is more variety (higher variance) in H 0 death times, rather than the consistent peak around 0.8 with less variance for other race groups. This shows that there is more variance for White face distribution in the embedding space compared to other race groups, a trend that was not present in the centroid distance distribution for race groups, which showed four bell-shaped density plots. Thus, our analysis of the ( H 0 ) death times supports previous ﬁndings that the White race group is clustered differently to other race groups. We note that there is less inequality in H 0 death times for female vs. male faces, despite our p -value indicating that this discrepancy may be signiﬁcant ( p < 0.05).

Quantifying Bias in a Face Verification System

Citations

FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations

References

Visualizing Data using t-SNE

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

A Cluster Separation Measure

A dendrite method for cluster analysis

FaceNet: A Unified Embedding for Face Recognition and Clustering

Related Papers (5)

Efficient generic face model fitting to images and videos

Face Recognition in Low Resolution Using a 3D Morphable Model.

Physician-Friendly Machine Learning: A Case Study with Cardiovascular Disease Risk Prediction

Statistical model for human face detection using multi-resolution features

Overview of Deep Learning Models in Biomedical Domain with the Help of R Statistical Software