No added camera-control module
Camera-induced warps are packed as visual history, so control enters through the pathway the backbone already uses to continue video.
Video History is More Than Context.
1 Shanghai Jiao Tong University 2 Shanghai AI Laboratory 3 Shanghai Innovation Institute
Warp-as-History enables interactive camera trajectory following and viewpoint manipulation, similar to HappyOyster and Genie 3, using only a single camera-annotated training example.
Warp-as-History asks whether a pretrained history-conditioned video generator can follow a target camera trajectory without a new camera encoder, control branch, or test-time optimization. The answer is to convert camera-induced warps into camera-warped pseudo-history, align those history tokens to the target frames being denoised, and drop tokens that do not have valid source observations.
The same interface reveals non-trivial zero-shot camera-following behavior in the frozen model. A lightweight offline LoRA update on one separate camera-annotated video then stabilizes this behavior across unseen scenes and trajectories, improving camera adherence, visual quality, and motion dynamics without target-video adaptation.
Camera-induced warps are packed as visual history, so control enters through the pathway the backbone already uses to continue video.
The offline LoRA update calibrates when to follow visible warp evidence and when to rely on the generator for completion and motion.
Each demo is driven by a target camera trajectory and rendered with the same direct sampler used by the pretrained backbone. The main wall shows one representative per scene; repeated first-frame groups are collected below to make the trajectory effect easier to compare.
Expanded playback
Continue from the first frame. The vast square of the Forbidden City at dawn remains quiet and solemn, with ancient red walls, golden roofs, white stone terraces, and soft morning haze. A majestic Chinese dragon rests across the palace courtyard, its long serpentine body coiled over the stone ground, with shimmering scales, flowing whiskers, and an ancient ceremonial presence. The dragon should feel noble, powerful, and mythical. Dust drifts across the square, scales glint in the soft light, banners and fabric edges move gently in the wind, and the dragon’s body shows faint breathing or small graceful movements, as if it is slowly awakening. The atmosphere should feel solemn, mythical, and emotionally striking, as if legend has appeared in the heart of the imperial palace. Keep the world immersive and explorable, with camera movement. Cinematic, highly detailed, soft morning light, mysterious but believable, no text, no watermark.
These groups keep the source frame and prompt fixed while changing the target camera path. The goal is to show that the history cue steers where the viewpoint moves, while the pretrained generator still completes disocclusions and independent scene motion.
Trajectory display
Warp-as-History does not treat the warp as a hard render target. It presents reliable geometric evidence through the model's visual-history pathway, aligns that evidence to the current denoising window, and lets the pretrained prior handle disocclusion, refinement, and dynamic foregrounds.
Render the available observation under the target camera trajectory and pack it as ordinary visual history.
Keep the warp in the history stream, but assign its temporal positions to the matching target frames.
Drop invalid warp tokens so newly visible or unreliable regions are completed by the generator.
Use a lightweight offline LoRA update to stabilize the exposed camera-follow behavior across unseen videos.
@misc{wang2026warpashistory,
title = {Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video},
author = {Wang, Yifan and He, Tong},
year = {2026}
}