Skip to main content
This recipe combines a photo (real or AI avatar) with audio you record, and returns an MP4 with the face talking and lips synced. Ideal for ads, brand voiceovers or testimonials.

Outcome

  • 1 vertical 9:16 video where your avatar (or any photo of a person) says exactly what you recorded, with synced lips.

Time and credits

  • Total time: ~5-10 minutes (depending on audio length).
  • Credits: 4 cr per real second of speech. A 15 s voiceover = 60 cr.

Steps

1

Record the audio

On your phone or with a clean mic. What matters: clarity, no echo, no noise. MP3 or WAV. Natural delivery, short sentences.No audio? Skip this step and use TTS in the next one.
2

Prepare the photo

A vertical image of the person or avatar. Front or 3/4, face well lit. It can be:
  • One of the variations from your AI influencer.
  • A real photo (yours or a client’s, with permission).
  • An image generated with Image.
3

Take everything to Talking photo

At zevor.ai/talking-photo:
  • Upload the photo.
  • Audio tab: upload your MP3/WAV (or paste a link if it’s online).
  • Alt: Text tab, type what you want it to say and pick a TTS voice.
4

Generate and download

Takes 1-3 minutes. Download the MP4. Voice is synced to lips; the result goes straight to publish.

Best practices

  • Clean audio: noisy recordings can drift lip sync at specific spots. Recording somewhere quiet or with a decent mic changes the result a lot.
  • Natural 5-10 s phrases. 30 s blocks work but feel more artificial.
  • Same photo = same identity. If you make several voiceovers with the same face, the audience starts associating them. Same principle as an AI influencer (see that recipe).
  • Profile loses detail: if your photo is heavily sideways, lip sync loses fidelity.

Common mistakes

  • Very low-quality audio: sync depends on what the model hears. Re-record before spending credits.
  • Photo with multiple faces: the AI picks one. Crop so only the person who will speak is in frame.
More on limits and formats at Talking photo.