
A sparse-frame video dubbing framework for audio-driven video generation with accurate lip synchroni
InfiniteTalk is a sparse-frame video dubbing framework for audio-driven video generation. Given an input video and an audio track, it synthesizes a new video with accurate lip synchronization while naturally aligning head movements, body posture, and facial expressions with the speech. Unlike traditional dubbing methods that focus only on lip movements, InfiniteTalk supports infinite-length video generation with stable identity preservation. It also operates in an image-and-audio-to-video mode, generating realistic talking videos from a single image and an audio input.
No comments yet. Start the conversation!