Low-resolution face recognition remains challenging due to severe degradations in probe images, domain differences between high-resolution gallery and low-resolution probe data, and catastrophic forgetting during low-resolution adaptation. FaceMoE introduces a transformer with Mixture-of-Experts feed-forward blocks and a top-k router that dynamically activates specialized experts for different semantic facial regions. This resolution-aware sparse routing improves feature extraction under degradation while preserving pretrained knowledge and scaling capacity efficiently.
Across eleven datasets (high-quality, mixed-quality, and low-resolution benchmarks), FaceMoE outperforms prior state-of-the-art methods, including strong gains on BRIAR Protocol 3.1, IJB-S, and TinyFace.
The figure highlights the central low-resolution face recognition challenges that motivate FaceMoE: (1) severe degradation in probe frames causes weak and unstable identity cues, making reliable feature aggregation difficult; (2) a strong domain gap exists between high-resolution gallery images and low-resolution probe images, where models rely on different facial cues across resolutions; and (3) naive fine-tuning on low-resolution data can trigger catastrophic forgetting and reduce pretrained performance on high-quality data.
FaceMoE replaces the standard transformer FFN with multiple expert MLPs and a learnable top-k router. Each token is routed to a sparse subset of experts, producing adaptive feature transformations that capture different facial regions and frequency characteristics. In practice, the model achieves an effective tradeoff at N = 3 experts and k = 2 active experts per token.
Training uses a composite objective: face recognition loss (CosFace) plus router z-loss and load-balancing loss to stabilize routing and avoid expert collapse.
Comparison against prior methods on BRIAR Protocol 3.1 and IJB-S demonstrates strong low-resolution recognition performance and robust surveillance-domain generalization.
Expert activation maps show semantically meaningful specialization and improved routing behavior after low-resolution fine-tuning, supporting robust identity extraction from degraded faces.
Coming soon ...