In this work, we introduce Micro-World, an action-controlled interactive world model designed to generate high-quality, open-domain scenes. Built on top of the Wan2.1 family of models, we train both image-to-world (I2W) and text-to-world (T2W) variants to support a wide range of use cases. To foster open research and practical adoption in the community, we release the model weights, full training and inference code, as well as a curated dataset specifically tailored for controllable world modeling.
For action injection, we favor adaLN for its lightweight parameter footprint, and ControlNet for its strong empirical stability during training.
Note that released I2W model is trained using AdaLN architecture.
More info please refer to GitHub Repo.
Model Architecture
Video Result
T2W Model
In Domain
|
W
|
S
|
A
|
|
D
|
W+Ctrl
|
W+Shift
|
|
Multiple control
|
Mouse down and up
|
Mouse right and left
|
Open Domain
I2W Model
- Downloads last month
- 7
