Video History is More Than Context.

Warp-as-History Generalizable Camera-Controlled Video Generation from One Training Video

¹ Shanghai Jiao Tong University ² Shanghai AI Laboratory ³ Shanghai Innovation Institute

Warp-as-History enables interactive camera trajectory following and viewpoint manipulation, similar to HappyOyster and Genie 3, using only a single camera-annotated training example.

Paper Code Demos

Camera-warped pseudo-history Zero-shot camera-follow prior One-video LoRA finetuning

Abstract

Video history is more than context. It can be a camera-control interface.

Warp-as-History asks whether a pretrained history-conditioned video generator can follow a target camera trajectory without a new camera encoder, control branch, or test-time optimization. The answer is to convert camera-induced warps into camera-warped pseudo-history, align those history tokens to the target frames being denoised, and drop tokens that do not have valid source observations.

The same interface reveals non-trivial zero-shot camera-following behavior in the frozen model. A lightweight offline LoRA update on one separate camera-annotated video then stabilizes this behavior across unseen scenes and trajectories, improving camera adherence, visual quality, and motion dynamics without target-video adaptation.

Native interface

No added camera-control module

Camera-induced warps are packed as visual history, so control enters through the pathway the backbone already uses to continue video.

Low-resource finetuning

One separate video stabilizes the exposed prior

The offline LoRA update calibrates when to follow visible warp evidence and when to rely on the generator for completion and motion.

Results

Unseen scenes, prescribed camera paths, direct rollouts.

Each demo is driven by a target camera trajectory and rendered with the same direct sampler used by the pretrained backbone. The main wall shows one representative per scene; repeated first-frame groups are collected below to make the trajectory effect easier to compare.

Expanded playback

Dragon in the Forbidden City

Continue from the first frame. The vast square of the Forbidden City at dawn remains quiet and solemn, with ancient red walls, golden roofs, white stone terraces, and soft morning haze. A majestic Chinese dragon rests across the palace courtyard, its long serpentine body coiled over the stone ground, with shimmering scales, flowing whiskers, and an ancient ceremonial presence. The dragon should feel noble, powerful, and mythical. Dust drifts across the square, scales glint in the soft light, banners and fabric edges move gently in the wind, and the dragon’s body shows faint breathing or small graceful movements, as if it is slowly awakening. The atmosphere should feel solemn, mythical, and emotionally striking, as if legend has appeared in the heart of the imperial palace. Keep the world immersive and explorable, with camera movement. Cinematic, highly detailed, soft morning light, mysterious but believable, no text, no watermark.

Speed

429 frames seed 42 Forbidden City dragon

Diverse trajectories

Same first frame, different camera histories.

These groups keep the source frame and prompt fixed while changing the target camera path. The goal is to show that the history cue steers where the viewpoint moves, while the pretrained generator still completes disocclusions and independent scene motion.

Trajectory display

Diverse trajectory

Speed

429 frames seed 42 1950s diner

Method

Route the warp through the native history stream.

Warp-as-History does not treat the warp as a hard render target. It presents reliable geometric evidence through the model's visual-history pathway, aligns that evidence to the current denoising window, and lets the pretrained prior handle disocclusion, refinement, and dynamic foregrounds.

Camera-warped pseudo-history

Render the available observation under the target camera trajectory and pack it as ordinary visual history.

Target-frame alignment

Keep the warp in the history stream, but assign its temporal positions to the matching target frames.

Visible-token selection

Drop invalid warp tokens so newly visible or unreliable regions are completed by the generator.

One-video finetuning

Use a lightweight offline LoRA update to stabilize the exposed camera-follow behavior across unseen videos.

Citation

Cite Warp-as-History

@misc{wang2026warpashistory,
  title  = {Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video},
  author = {Wang, Yifan and He, Tong},
  year   = {2026}
}