Skip to main content
Talking photo animates a still image so the subject says whatever you want, with lips synced to the audio. It works with real photos and with AI-generated avatars.

What it’s for

  • Make your AI avatar talk (same character, different voiceovers).
  • Reels where you want to show someone speaking without filming a real person every time.
  • Short ads with a testimonial or punchline.
  • Turn a portrait into a presenter video.

What you need

A photo

Well-lit portrait, face visible and sharp. JPG/PNG.

Audio or text

Upload an MP3/WAV with the voice, or type the text and pick an AI voice (TTS).

How it works

1

Upload the photo

A single image. Vertical or square works well for 9:16.
2

Provide the voice

Two paths:
  • Your own audio: record or upload MP3/WAV. The voice is yours, ideal when you have a specific voiceover.
  • TTS: type the text and pick one of the AI voices. Faster when you don’t have audio ready.
3

Generate and download

Takes 1-3 minutes depending on audio length. The result is an MP4 with the face talking and the audio synced.

Cost

Speech durationApprox cost
5 s20 cr
10 s40 cr
20 s80 cr
30 s120 cr
Priced per real second of speech (4 cr/s). If you go through TTS, the system measures the produced voice and bills exactly that (overcharges are refunded after generation).

Best practices

  • Front or 3/4 shot: lip sync works best when the face looks at camera. Pure profile loses detail.
  • Clean audio: noisy voice tracks can make lips drift. Record somewhere quiet or use TTS.
  • Short, natural text with TTS. Long sentences feel more artificial than two shorter ones in a row.
  • Pair with Characters to reuse the same AI face across all your talking videos.

Limits

  • One person in the photo. If there are several, AI picks one.
  • No exotic languages: any language with standard TTS works.