SegFace : Face Segmentation of Long-Tail Classes

AAAI 2025

Johns Hopkins University

Contributions

The key contributions of our work are as follows:

  • We introduce a lightweight transformer decoder with learnable class-specific tokens, that ensures each token is dedicated to a specific class, thereby enabling independent modeling of classes. The design effectively addresses the challenge of poor segmentation performance of long-tail classes, prevalent in existing methods.
  • Our multi-scale feature extraction and MLP fusion strategy, combined with a transformer decoder that leverages learnable class-specific tokens, mitigates the dominance of head classes during training and enhances the feature representation of long-tail classes.
  • SegFace establishes a new state-of-the-art performance on the LaPa dataset (93.03 mean F1 score) and the CelebAMask-HQ dataset (88.96 mean F1 score). Moreover, our model can be adapted for fast inference by simply swapping the backbone with a MobileNetV3 backbone. The mobile version achieves a mean F1 score of 87.91 on the CelebAMask-HQ dataset with 95.96 FPS.

SegFace Framework

Figure 1. The proposed architecture, SegFace, addresses face segmentation by enhancing the performance on long-tail classes through a transformer-based approach. Specifically, multi-scale features are first extracted from an image encoder and then fused using an MLP fusion module to form face tokens. These tokens, along with class-specific tokens, undergo self-attention, face-to-token, and token-to-face cross-attention operations, refining both class and face tokens to enhance class-specific features. Finally, the upscaled face tokens and learned class tokens are combined to produce segmentation maps for each facial region.

Quantitative Results

Table 1. Quantitative results on (a) LaPa dataset and (b) CelebAMask-HQ dataset

Table 2. Ablation study for different backbones and varying image resolution.

Qualitative Results

Figure 2. The qualitative comparison highlights the superior performance of our method, SegFace, compared to DML-CSR. In (a), SegFace effectively segments both long-tail classes like earrings and necklaces as well as head classes such as hair and neck. In (b), it also excels in challenging scenarios involving multiple faces, human-resembling features, poor lighting, and occlusion, where DML-CSR struggles.

Figure 3. Additional qualitative comparison of our proposed method, SegFace, compared to DML-CSR on the (a) CelebAMask- HQ and (b) LaPa dataset.

BibTeX

@article{narayan2024segface,
  title={SegFace: Face Segmentation of Long-Tail Classes},
  author={Narayan, Kartik and VS, Vibashan and Patel, Vishal M},
  journal={arXiv preprint arXiv:2412.08647},
  year={2024}
}
Acknowledgement: The website template is taken from Nerfies