Comparison with representative methods under different task settings. FaceXformer can perform various facial analysis tasks in single model. FP - Face Parsing, LD - Landmarks Detection, HPE - Head Pose Estimation, Attr - At- tributes Recognition, Age - Age Estimatin, Gen - Gender Estimation, Race - Race Estimation, Vis - Landmarks Visibility Prediction, MD - Multi-dataset Training
Overview of FaceXformer framework. It employs an encoder-decoder architecture, extracting multi-scale features from the input face image I, and fusing them into a unified representation F via MLP-Fusion. Task tokens T are processed alongside face representation F in the decoder, resulting in refined task-specific tokens T ^ . These refined tokens are then used for task-specific predictions by passing through the unified head.
Comparison with specialized models and existing multi-task networks.
Qualitative comparison of FaceXformer against other multi-task models
Visualization of "in-the-wild" images queried for multiple task tokens. Attributes represent the 40 binary attributes defined in the CelebA dataset, indicating the presence (1) or absence (0) of specific facial attributes
@article{narayan2024facexformer,
title={FaceXFormer: A Unified Transformer for Facial Analysis},
author={Narayan, Kartik and VS, Vibashan and Chellappa, Rama and Patel, Vishal M},
journal={arXiv preprint arXiv:2403.12960},
year={2024}
}