In recent times, there have been outstanding breakthroughs in image-to-video era. Nevertheless, the 3D consistency and digital camera controllability of generated frames have remained unsolved. Latest research have tried to include digital camera management into the era course of, however their outcomes are sometimes restricted to easy trajectories or lack the power to generate constant movies from a number of distinct digital camera paths for a similar scene. To handle these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video era, able to changing an enter picture into a number of spatiotemporally constant movies. Our framework extends the spatial and temporal consideration modules into view-integrated consideration modules, bettering each viewpoint and temporal consistency. This versatile design permits for joint coaching with various curated knowledge sources, together with scene-level static movies, object-level artificial multi-view dynamic movies, and real-world monocular dynamic movies. To the perfect of our data, Cavia is the primary framework that allows customers to generate a number of movies of the identical scene with exact management over digital camera movement, whereas concurrently preserving object movement. In depth experiments reveal that Cavia surpasses state-of-the-art strategies by way of geometric consistency and perceptual high quality.
- ** Work finished whereas at Apple
- † College of Texas at Austin