Figure 1. a) Style Abstraction vs. Traditional style transfer. (Top) Stylized abstraction techniques capture core identifying attributes while allowing stylistic distortion to preserve the intended visual style. (Bottom) Traditional style transfer preserves geometry and appearance but applies texture-based styles, often failing to generalize beyond appearance-level edits. b) Comparison across existing style transfer/personalized generation using a single image of a non-celebrity subject. Most methods struggle to retain semantic identity for everyday individuals, while our training-free method preserves key identity cues across diverse styles.
Stylized abstraction involves exaggerating or simplifying the features of a subject to create a stylized representation. Rather than aiming for photorealism, it emphasizes recognizable traits that evoke the subject's concept or identity (Illustrated in Figure Figure 1.(a)). Stylized representation aims to capture the essence of a subject through visual abstraction, focusing less on exact likeness and more on the retention of key, recognizable features. For instance, a knitted doll or a LEGO figure of Einstein may omit intricate facial geometry or biometric precision, yet still be immediately identifiable due to consistent visual traits such as his distinctive hair, mustache, or attire. These features serve as semantic anchors, allowing viewers to recognize the subject even in highly abstracted or playful forms. This form of representation is widespread in media, animation, and merchandising, where retaining a character's identity in a simplified, reproducible form is essential. Terms like personified toy representation or iconic stylization are often used to describe such instances. Unlike traditional image-to-image translation, which typically enforces structural consistency, stylized abstraction embraces simplification, distortion, or even exaggeration to evoke familiarity and conceptual identity.
The key contributions of our work are as follows:
With our proposed work, we are able to achieve strong generalization across unseen identities and styles. We demonstrate through extensive experiments (quantitative metrics like KID and CLIPScore, plus human studies) that the framework generalizes robustly to a wide variety of abstract styles (e.g., LEGO, knitted dolls, South Park) and out-of-distribution, everyday subjects, all in a fully open-source setup.
Our work conists of three main parts: (1) Identity distillation via infernece time VLLM scaling, (2) Cross-domain latent reversal with rectified flow, and (3) Human-Aligned Style Bench Framework.
Comparison of stylization methods across KID, CLIP, StyleBench, and human evaluation scores. Methods are grouped into fine-tuned, encoder-based, and training-free categories.
Coming Soon ... !!!