Training Free Stylized Abstraction

Johns Hopkins University

What is Stylized Abstraction ?

Figure 1. a) Style Abstraction vs. Traditional style transfer. (Top) Stylized abstraction techniques capture core identifying attributes while allowing stylistic distortion to preserve the intended visual style. (Bottom) Traditional style transfer preserves geometry and appearance but applies texture-based styles, often failing to generalize beyond appearance-level edits. b) Comparison across existing style transfer/personalized generation using a single image of a non-celebrity subject. Most methods struggle to retain semantic identity for everyday individuals, while our training-free method preserves key identity cues across diverse styles.

Stylized abstraction involves exaggerating or simplifying the features of a subject to create a stylized representation. Rather than aiming for photorealism, it emphasizes recognizable traits that evoke the subject's concept or identity (Illustrated in Figure Figure 1.(a)). Stylized representation aims to capture the essence of a subject through visual abstraction, focusing less on exact likeness and more on the retention of key, recognizable features. For instance, a knitted doll or a LEGO figure of Einstein may omit intricate facial geometry or biometric precision, yet still be immediately identifiable due to consistent visual traits such as his distinctive hair, mustache, or attire. These features serve as semantic anchors, allowing viewers to recognize the subject even in highly abstracted or playful forms. This form of representation is widespread in media, animation, and merchandising, where retaining a character's identity in a simplified, reproducible form is essential. Terms like personified toy representation or iconic stylization are often used to describe such instances. Unlike traditional image-to-image translation, which typically enforces structural consistency, stylized abstraction embraces simplification, distortion, or even exaggeration to evoke familiarity and conceptual identity.

Contributions

The key contributions of our work are as follows:

  • Training-Free Stylized Abstraction Framework: We introduce a fully training-free pipeline that generates stylized abstractions from a single input image by leveraging inference-time scaling in vision-language models (VLLMs) to extract and preserve identity-relevant features.
  • Cross-Domain Rectified Flow Inversion: We propose a novel cross-domain latent reversal strategy using rectified flows, which reconstructs subject structure in the abstracted style domain based on style-dependent priors, rather than relying on photo-realistic inversions.
  • Style-Aware Temporal Scheduling: We introduce a style-aware temporal scheduling that dynamically modulates structural restoration through a VLLM-driven, style-aware temporal controller, enabling high-fidelity reconstructions that balance semantic identity preservation with stylistic exaggeration.
  • Multi-Round Abstraction-Aware Generation Loop: We employ an iterative VLLM-in-the-loop process where missing or misaligned identity cues are identified and reintegrated across multiple rounds, achieving convergence without any model fine-tuning.
  • StyleBench: Human-Aligned Evaluation Protocol: We propose StyleBench, a GPT-based, human-aligned benchmarking metric explicitly designed for abstraction styles where traditional pixel-level metrics fail assessing fidelity along axes of style adherence, identity preservation, and fusion quality.

With our proposed work, we are able to achieve strong generalization across unseen identities and styles. We demonstrate through extensive experiments (quantitative metrics like KID and CLIPScore, plus human studies) that the framework generalizes robustly to a wide variety of abstract styles (e.g., LEGO, knitted dolls, South Park) and out-of-distribution, everyday subjects, all in a fully open-source setup.

Method

Our work conists of three main parts: (1) Identity distillation via infernece time VLLM scaling, (2) Cross-domain latent reversal with rectified flow, and (3) Human-Aligned Style Bench Framework.

Qualitative Results

Quantitative Results

Comparison of stylization methods across KID, CLIP, StyleBench, and human evaluation scores. Methods are grouped into fine-tuned, encoder-based, and training-free categories.

BibTeX

Coming Soon ... !!!
            
Acknowledgement: The website template is taken from Nerfies