> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zevor.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Talking photo

> Drop a photo and your audio (or text), and the face talks with synced lips. For reels, ads, voiceovers with a fixed character or an AI avatar.

Talking photo animates a still image so the subject **says whatever you
want**, with lips synced to the audio. It works with real photos and
with AI-generated avatars.

## What it's for

* Make your AI avatar talk (same character, different voiceovers).
* Reels where you want to show someone speaking without filming a
  real person every time.
* Short ads with a testimonial or punchline.
* Turn a portrait into a presenter video.

## What you need

<CardGroup cols={2}>
  <Card title="A photo" icon="image">
    Well-lit portrait, face visible and sharp. JPG/PNG.
  </Card>

  <Card title="Audio or text" icon="microphone">
    Upload an MP3/WAV with the voice, or type the text and pick an AI
    voice (TTS).
  </Card>
</CardGroup>

## How it works

<Steps>
  <Step title="Upload the photo">
    A single image. Vertical or square works well for 9:16.
  </Step>

  <Step title="Provide the voice">
    Two paths:

    * **Your own audio**: record or upload MP3/WAV. The voice is
      yours, ideal when you have a specific voiceover.
    * **TTS**: type the text and pick one of the AI voices. Faster
      when you don't have audio ready.
  </Step>

  <Step title="Generate and download">
    Takes 1-3 minutes depending on audio length. The result is an MP4
    with the face talking and the audio synced.
  </Step>
</Steps>

## Cost

| Speech duration | Approx cost |
| --------------- | ----------- |
| 5 s             | 20 cr       |
| 10 s            | 40 cr       |
| 20 s            | 80 cr       |
| 30 s            | 120 cr      |

Priced per **real** second of speech (4 cr/s). If you go through TTS,
the system measures the produced voice and bills exactly that
(overcharges are refunded after generation).

## Best practices

* **Front or 3/4 shot**: lip sync works best when the face looks at
  camera. Pure profile loses detail.
* **Clean audio**: noisy voice tracks can make lips drift. Record
  somewhere quiet or use TTS.
* **Short, natural text** with TTS. Long sentences feel more
  artificial than two shorter ones in a row.
* Pair with [Characters](/en/modes/characters) to reuse the same AI
  face across all your talking videos.

## Limits

* One person in the photo. If there are several, AI picks one.
* No exotic languages: any language with standard TTS works.

<Note>
  Try it: [zevor.ai/talking-photo](https://zevor.ai/talking-photo).
</Note>
