Qwen3-4B-Thinking-2507 text encoder - v1.0 - qwen34BThinking2507_v10.safetensors

A fully ready Qwen3-4B-Thinking-2507 build. Compared to vanilla Qwen3-4B, it delivers noticeably better prompt adherence with Z-Image models and avoids common wording misinterpretations. Highly recommended for both inference and Z-Image LoRA training.
Path: ComfyUI_windows_portable\ComfyUI\models\text_encoders\

Qwen3-4B-Thinking-2507 USAGE:

[ANCHOR: who / what exists]
[ROLE or STATE: what defines them conceptually]
[ACTION or POSTURE: what they are doing or how they are positioned]
[RELATIONSHIP: how they relate to space, objects, or viewer]
[ENVIRONMENT: where this takes place, minimally]
[INTENT: what the image is meant to communicate]
[LIGHTING: chosen to support the intent]
[CAMERA / FRAMING: how the viewer perceives it]
[STYLE RESTRAINTS: what it should resemble, softly]
[CONSTRAINTS: what must be avoided]

example:
a single adult man,
calm and self-contained rather than expressive,
standing upright with relaxed posture,
positioned slightly off-center to create quiet tension,
inside a simple, uncluttered interior space,
the focus is on presence and character rather than action,
soft indirect light so that facial features remain natural,
eye-level camera, medium framing from the chest up,
realistic but understated photographic style,
no exaggerated emotion, no stylization, no dramatic effects

example 2:
[SUBJECT / ANCHOR],  
[TRAIT / MOOD / PERSONALITY],  
[ACTION / POSTURE / STATE],  
[POSITION / RELATION TO SPACE / COMPOSITION],  
[ENVIRONMENT / SETTING],  
[INTENT / WHAT THE IMAGE SHOULD CONVEY],  
[LIGHTING / ATMOSPHERE],  
[CAMERA / FRAMING / PERSPECTIVE],  
[STYLE / ARTISTIC DIRECTION],  
[FORM CLARITY / SHAPE / TEXTURE / COLOR DIRECTIONS]

example:
a single adult man,  
calm and self-contained,  
standing upright with relaxed posture,  
positioned slightly off-center to create quiet tension,  
inside a simple, uncluttered interior space,  
showing presence and character through posture and expression,  
soft indirect light to enhance facial features naturally,  
eye-level camera, medium framing from the chest up,  
photographic style with subtle tones and understated textures,  
featuring clear forms, natural proportions, and readable visual composition

You use this inside of your positive prompt; meaning the example part only. the explaining part is just for you to understand the layout not the text encoder

**Please note that Qwen3-4B-Thinking-2507 is just experimental with this model but with right tweaks it can provide great outputs and any trained lora on the vanilla qwen3_4b will not function properly under this encoder so you will need to retrain using this text encoder.

full training thread can be found here
training config for AiToolKit with Qwen3-4B-Thinking-2507 text encoder Can be found here.