ChuniWorld: Interactive Long-Horizon World Modeling Toward Generative Reality

An interactive autoregressive world model with real-time camera control, prompt switching, and long-horizon memory consistency.

Scroll ↓
720p
Resolution
24 FPS
Real-time Generation
60s+
Long-horizon
15B
Parameters

Method Overview

ChuniWorld demo video
ChuniWorld teaser: diverse scenes, camera control and interactive magic

ChuniWorld is built around four core properties — agency, persistence, durability, and responsiveness.

Agency

Two control channels: a rendered 3D cache with lightweight AdaLN camera modulation for grounded, trajectory-aware navigation, and chunk-level prompt switching to introduce new events mid-generation.

Persistence

World-consistent memory — an explicit 3D cache reprojected to the queried view for spatial recall, plus a compressed frame-history embedding for temporal continuity, so revisited places stay recognizable.

Durability

Long-horizon stability from training on drifted histories and an error bank that re-injects accumulated artifacts into both memory and target, preventing errors from compounding over minute-long rollouts.

Responsiveness

Real-time interaction via few-step DMD distillation and short temporal chunks, with prompt switching at chunk boundaries to minimize both visual and semantic latency.

Demo Results

Diverse Style Video Generation

Same scene, seven styles — drag the handles to compare, switch the scene below.

Realistic Oil Painting Ink Wash Cyberpunk Zelda Minecraft Pixel

Camera-Controllable Video Generation

Real-time 6-DoF camera control with an on-screen joystick — use the arrows to browse 16 clips.

Game-Style Video Generation

Worlds generated in real-game and synthesized game styles — use the arrows to browse 16 clips.

Prompt-Driven Interaction

Drop a prompt mid-scene to trigger an event — same world, different spells. Use the arrows to browse.

World-Consistent Video Generation

Turn left and right, then look back — does the world stay consistent? Same trajectory, four methods.

Long-Horizon Video Generation

Over a minute of continuous, drift-free generation per clip — all shown at 2× speed.

Try Your Own Magic!

Walk up to the pyramid, then cast a spell — pick one and watch it unfold.

Walking to the pyramid…

Team

Within each role category, authors are listed in alphabetical order by their first names.

Core Lead: Kaipeng Zhang
Lead: Chuanhao Li
Core Contributor: Chuanhao Li, Kaipeng Zhang, Yifan Zhan, Yongtao Ge, Yuanyang Yin
Contributor: Jiaming Tan, Kang He, Liaoyuan Fan, Ruicong Liu, Xiaojie Xu, Xuangeng Chu, Zhen Li, Zhengyuan Lin, Zhixiang Wang, Zian Meng, Zihui Gao