Text-to-Speech Studio

Clone voices and synthesize natural speech with AI

Text-to-Speech Studio logo

Select one of the preset voices for quick speech generation:

English Voices

Alex

Clear professional male voice with neutral accent

professional

Emma

Warm friendly female voice with British accent

friendly

James

Deep authoritative male voice for narrations

authoritative

Sophia

Elegant sophisticated female voice

elegant

Luna

Energetic youthful voice perfect for modern content

energetic

Русские голоса

Михаил

Четкий мужской голос для русской речи

четкий

Анастасия

Мягкий женский голос с теплой интонацией

мягкий

Describe the script, desired pacing, energy, pronunciation notes, and target language.

0 / 20000 characters

Upload 1–4 clean voice samples (5–60 seconds) that represent target tone and accent.
Supported formats: WAV, MP3, OGG, FLAC, AAC, M4A, OPUS. Max size 50 MB each.

Checking authorization...
Maximum 4 references.
Results include generated speech audio.
No news found matching the criteria

Description

Text-to-Speech Service Guide

Welcome to the Problembo Text-to-Speech studio. Follow this guide to produce natural AI speech that matches your script and target voice.

1. Overview

Our TTS engine combines textual prompts with ordered voice references to deliver high-fidelity speech. The first reference sets the primary tone; additional references let you cover different emotions, pacing, or phonetics.

Key capabilities:

  • Clone a speaker from short samples (5–60 seconds)
  • Control rate, energy, and emotion via prompt directives
  • Generate aligned metadata for further editing or lip-sync
  • Support for multilingual synthesis with phoneme-level control

2. Writing Effective Prompts

The prompt controls what the speaker says and how they say it. Keep it concise but descriptive.

Prompt checklist:

  • Provide the full script or bullet outline
  • Specify pacing cues ([pause 1.2s], faster, softly)
  • Clarify emotional state (warm, confident, urgent)
  • Mention pronunciation hints for names or acronyms ("H. Q." pronounce letters)
  • Add language/locale tags if different from the references

Example:

Deliver a 25s launch announcement in English with upbeat energy.
Keep sentences short, smile on key phrases, pause 1s before the price reveal.

3. Preparing Voice References

  • Prefer clean studio or podcast quality audio
  • Remove background music and compression artifacts
  • Keep files between 5 and 60 seconds
  • Upload in WAV, MP3, OGG, FLAC, AAC, M4A, or OPUS (≤50 MB)
  • Order matters: slot #1 is the primary style, next slots add variations
  • Use different samples to cover emotional range or tricky phonemes

4. Workflow Tips

  1. Upload at least one reference before submitting the prompt
  2. Reorder by removing and re-adding slots if needed
  3. Reuse the same references for multiple prompts to stay consistent
  4. Monitor task progress in the panel; results include audio and timing JSON

5. Troubleshooting

  • Robotic results: add more nuanced instructions (breathing, pauses)
  • Incorrect language: explicitly state target language and phonetics
  • Noisy output: provide cleaner references or reduce background noise
  • Mispronunciations: add phonetic hints or re-record the reference segment

Happy synthesizing!