NanoWM-B/1 - Web-DINO - DINO-WM / pusht

Encoder-backbone comparison checkpoint for the pusht environment from the DINO-WM suite. This run replaces the SD-VAE latent target with frozen Web-DINO patch features and trains NanoWM-B/1 for 100,000 steps.

This repository contains only the NanoWM transformer weights and training configuration. It does not include Weights & Biases logs or the Web-DINO encoder weights.

Run identity

collection: https://huggingface.co/collections/knightnemo/nano-world-model
reference baseline: knightnemo/nanowm-b2-dino-wm-pusht-100k

Training setup

Key	Value
Architecture	NanoWM-B/1 (~160M params)
Latent codec	Web-DINO, 224 input, 14px patches, 16x16x1024 features
Dataset	DINO-WM `pusht`
Frames	4
Context frames	1
Action injection	additive
Steps	100,000
Effective batch	64
Optimizer	AdamW, lr 1e-4, wd 0.01
Precision	bf16-mixed, `torch.compile` on
Seed	3407

Diffusion setup

Key	Value
pred_name	v
noise_schedule	`squaredcos_cap_v2`
zero_terminal_snr	true
timestep_sampling	logit_normal
snr_gamma	5.0
diffusion_steps	1000 train, 250 DDIM sample

Loading

git clone git@github.com:knightnemo/nano-world-model.git
cd nano-world-model
huggingface-cli download knightnemo/nanowm-b1-webdino-dino-wm-pusht-100k --local-dir ./ckpt

import sys
from omegaconf import OmegaConf
from safetensors.torch import load_file

sys.path.insert(0, "src")
from models import get_models

cfg = OmegaConf.load("ckpt/config.yaml")
cfg.experiment.infra.compile = False
model = get_models(cfg).eval()

state_dict = load_file("ckpt/model.safetensors")
model.load_state_dict(state_dict, strict=True)

The config expects a Web-DINO encoder compatible with facebook/webssl-dino300m-full2b-224 and encoder-only latent metrics. Since this latent codec has no decoder, pixel video sampling and pixel metrics are not available from this checkpoint alone.

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including knightnemo/nanowm-b1-webdino-dino-wm-pusht-100k

🌍 A minimalist repository for training video world models based on diffusion-forcing. • 20 items • Updated May 17 • 7