The key contributions of our work are as follows:
Figure 1. The proposed architecture, SegFace, addresses face segmentation by enhancing the performance on long-tail classes through a transformer-based approach. Specifically, multi-scale features are first extracted from an image encoder and then fused using an MLP fusion module to form face tokens. These tokens, along with class-specific tokens, undergo self-attention, face-to-token, and token-to-face cross-attention operations, refining both class and face tokens to enhance class-specific features. Finally, the upscaled face tokens and learned class tokens are combined to produce segmentation maps for each facial region.
Table 1. Quantitative results on (a) LaPa dataset and (b) CelebAMask-HQ dataset
Table 2. Ablation study for different backbones and varying image resolution.
Figure 2. The qualitative comparison highlights the superior performance of our method, SegFace, compared to DML-CSR. In (a), SegFace effectively segments both long-tail classes like earrings and necklaces as well as head classes such as hair and neck. In (b), it also excels in challenging scenarios involving multiple faces, human-resembling features, poor lighting, and occlusion, where DML-CSR struggles.
Figure 3. Additional qualitative comparison of our proposed method, SegFace, compared to DML-CSR on the (a) CelebAMask- HQ and (b) LaPa dataset.
@article{narayan2024segface, title={SegFace: Face Segmentation of Long-Tail Classes}, author={Narayan, Kartik and VS, Vibashan and Patel, Vishal M}, journal={arXiv preprint arXiv:2412.08647}, year={2024} }