Comparison with representative methods under different task settings. FaceXformer can perform various facial analysis tasks in single model. FP - Face Parsing, LD - Land- marks Detection, HPE - Head Pose Estimation, Attr - Attributes Recognition, Age - Age, Gen - Gender, Race - Race Estimation, Exp - Facial Expression Recognition, and Vis - Face Visibility
Overview of FaceXformer framework. It employs an encoder-decoder architecture, extracting multi-scale features from the input face image I, and fusing them into a unified representation F via MLP-Fusion. Task tokens T are processed alongside face representation F in the decoder, resulting in refined task-specific tokens T ^ . These refined tokens are then used for task-specific predictions by passing through the unified head.
Comparison with specialized models and existing multi-task networks on Face Parsing.
Comparison with specialized models and existing multi-task networks on Headpose Estimation, Landmarks Detection and Attributes Prediction.
Comparison with specialized models and existing multi-task networks on Facial Expression Recognition, Face Visibilty, and Age Estimation.
Qualitative results of FaceXFormer
@article{narayan2024facexformer,
title={FaceXFormer: A Unified Transformer for Facial Analysis},
author={Narayan, Kartik and VS, Vibashan and Chellappa, Rama and Patel, Vishal M},
journal={arXiv preprint arXiv:2403.12960},
year={2024}
}