Tag Usage Guide

⏸️ Pause Tags System

NEW: Intelligent pause insertion for precise speech rhythm control!

Syntax

You can use pause tags anywhere in the text. Various formats are supported:

Seconds: [pause:1.5], [pause:2s], [pause:3]
Milliseconds: [pause:500ms], [pause:1200ms], [pause:800ms]

Usage Examples

Welcome to our show! [pause:1s] Today we will discuss interesting topics.
[Alice] I am so excited! [pause:500ms] This will be great.
[pause:2] Let's move on to the main part.

🎭 Paralinguistic Tags

The service has built-in support for tags to add non-verbal sounds (breathing, laughter, etc.). These tags are processed directly by the model during speech generation, ensuring a natural sound.

Single Tags (Sound Insertion)

These tags insert a sound at the location where they are placed in the text.

Tag	Effect	Example
`<breath>`	Breath	`I'm tired <breath> let's rest`
`<quick_breath>`	Quick breath	`Running <quick_breath> almost there`
`<laughter>`	Laughter	`That's hilarious <laughter>!`
`<cough>`	Cough	`Excuse me <cough> sorry`
`<sigh>`	Sigh	`Fine <sigh> I'll do it`
`<gasp>`	Gasp (fright/surprise)	`Oh no <gasp> what happened?`
`<noise>`	Background noise	`Walking <noise> through the forest`
`<hissing>`	Hissing	`The snake <hissing> slithered away`
`<vocalized-noise>`	Vocalized noise	`Hmm <vocalized-noise> interesting`
`<lipsmack>`	Lip smack	`Tasty <lipsmack> food`
`<mn>`	Humming "mm"	`I think <mn> maybe`
`<clucking>`	Clucking	`Disapproving <clucking>`
`<accent>`	Accent/Emphasis	`Very <accent> important`

Wrapper Tags (Emotional Coloring)

Tag	Effect	Example
`<laughing>text</laughing>`	Speak text with laughter	`<laughing>so funny</laughing>!`
`<strong>text</strong>`	Emphasize text	`<strong>very important</strong>`

Language and Character Switching

Language switching (EN, ZH, JA, KO, DE, ES, FR, IT, RU). It is recommended to use the standard square bracket syntax:

[en:Alice] Hello world
[ru:Bob] Привет мир
[zh:] 你好世界

Recommendations

Format: Use angle brackets <breath> to avoid conflicts with character names.
Naturalness: Insert tags where appropriate in live speech (pauses before an answer, sighs when tired).
Moderation: 1-2 tags per sentence is recommended.
Limitations: The strength of effects (laughter volume, sigh duration) is regulated by the model itself and cannot be changed by parameters.

Text-to-Speech Studio

English Voices

Michael

James

Mark

Sarah

Kate

Emma

What's new

Description