Voozh

Convert text to speech
curl --request POST \
 --url https://supertoneapi.com/v1/text-to-speech/{voice_id} \
 --header 'Content-Type: application/json' \
 --header 'x-sup-api-key: <api-key>' \
 --data '
{
 "text": "<string>",
 "style": "<string>",
 "model": "sona_speech_1",
 "output_format": "wav",
 "voice_settings": {
 "pitch_shift": 0,
 "pitch_variance": 1,
 "speed": 1,
 "duration": 0,
 "similarity": 3,
 "text_guidance": 1,
 "subharmonic_amplitude_control": 1
 },
 "include_phonemes": false,
 "normalized_text": "<string>"
}
'

"<string>"

POST

text-to-speech

{voice_id}

Convert text to speech
curl --request POST \
 --url https://supertoneapi.com/v1/text-to-speech/{voice_id} \
 --header 'Content-Type: application/json' \
 --header 'x-sup-api-key: <api-key>' \
 --data '
{
 "text": "<string>",
 "style": "<string>",
 "model": "sona_speech_1",
 "output_format": "wav",
 "voice_settings": {
 "pitch_shift": 0,
 "pitch_variance": 1,
 "speed": 1,
 "duration": 0,
 "similarity": 3,
 "text_guidance": 1,
 "subharmonic_amplitude_control": 1
 },
 "include_phonemes": false,
 "normalized_text": "<string>"
}
'

"<string>"

Generates speech from text and returns the audio in the response body. For the conceptual walkthrough, SDK examples, and tips, see Docs: Create speech.

Endpoint

POST https://supertoneapi.com/v1/text-to-speech/{voice_id}

Path parameters

Name	Required	Description
`voice_id`	✅	The ID of the target voice.

Request body

Name	Required	Description
`text`	✅	The text to convert. Max 300 characters. Use an SDK or split client-side for longer input.
`language`	✅	Language code (e.g. `en`, `ko`, `ja`). Must be supported by the voice and the model.
`style`	—	Emotional style (e.g. `neutral`, `happy`). If omitted, the voice’s default style is used.
`model`	—	TTS model. Defaults to `sona_speech_1`.
`output_format`	—	`wav` (default) or `mp3`.
`voice_settings`	—	Advanced voice parameters (see below).
`include_phonemes`	—	If `true`, response switches to JSON with base64 audio plus phoneme timing data. Default: `false`.
`normalized_text`	—	Pronunciation-normalized companion text (used by `sona_speech_2` and `sona_speech_2_flash`, primarily for Japanese).

Supported languages by model

Model	Languages
`sona_speech_2`, `sona_speech_2_flash`	`en`, `ko`, `ja`, `bg`, `cs`, `da`, `el`, `es`, `et`, `fi`, `hu`, `it`, `nl`, `pl`, `pt`, `ro`, `ar`, `de`, `fr`, `hi`, `id`, `ru`, `vi`
`supertonic_api_3`	`en`, `ko`, `ja`, `ar`, `bg`, `cs`, `da`, `de`, `el`, `es`, `et`, `fi`, `fr`, `hi`, `hr`, `hu`, `id`, `it`, `lt`, `lv`, `nl`, `pl`, `pt`, `ro`, `ru`, `sk`, `sl`, `sv`, `tr`, `uk`, `vi`
`supertonic_api_1`	`en`, `ko`, `ja`, `es`, `pt`
`sona_speech_1`	`en`, `ko`, `ja`

Voice settings

Unsupported settings are silently ignored — they don’t error.

Name	Range	Default	Description
`pitch_shift`	-24 → 24	0	Pitch shift in semitones.
`pitch_variance`	0 → 2	1	Degree of pitch variation.
`speed`	0.5 → 2	1	Playback rate multiplier. Applied after `duration`.
`duration`	0 → 60	0	When non-zero, generates audio targeting this length in seconds.
`similarity`	1 → 5	3	How closely the output matches the original character voice.
`text_guidance`	0 → 4	1	How sensitively delivery adapts to the text content.
`subharmonic_amplitude_control`	0 → 2	1	Subharmonic amplitude in the generated speech.

Voice settings by model

Setting	`sona_speech_2`	`sona_speech_2_flash`	`supertonic_api_3`	`supertonic_api_1`	`sona_speech_1`
`pitch_shift`, `pitch_variance`, `duration`	✅	✅	—	—	✅
`speed`	✅	✅	✅	✅	✅
`similarity`, `text_guidance`	✅	—	—	—	✅
`subharmonic_amplitude_control`	—	—	—	—	✅

Response

Default (include_phonemes=false): Binary audio in the body.

Content-Type: audio/wav or audio/mpeg (matches output_format).
X-Audio-Length header: duration of the generated audio in seconds.

When include_phonemes=true: JSON body with base64 audio plus phoneme arrays.

{
 "audio_base64": "UklGRnoGAABXQVZF...",
 "phonemes": {
 "symbols": ["", "h", "ɐ", "ɡ", "ʌ", ""],
 "start_times_seconds": [0, 0.092, 0.197, 0.255, 0.29, 0.58],
 "durations_seconds": [0.092, 0.104, 0.058, 0.034, 0.29, 0.162]
 }
}

Notes

text over 300 characters returns 400. Use the Python or TypeScript SDK for automatic chunking, or split manually — see Long text.
speed applies after duration. Setting duration=5 with speed=2 produces ~10 seconds of audio.
When style is omitted, the first value in the voice’s styles array is used. Different voices can have different defaults — call Get voice to check.

Docs: Create speech

Walkthrough with SDK examples.

Stream speech

Stream audio chunks instead of waiting for the full clip.

Authorizations

x-sup-api-key

string

header

required

Path Parameters

voice_id

string

required

Body

application/json

text

string

required

The text to convert to speech

Maximum string length: 300

language

enum<string>

required

The language code of the text

Available options:

en,

ko,

ja,

bg,

cs,

da,

el,

es,

et,

fi,

hu,

it,

nl,

pl,

pt,

ro,

ar,

de,

fr,

hi,

id,

ru,

vi,

hr,

lt,

lv,

sk,

sl,

sv,

tr,

uk

style

string

The style of character to use for the text-to-speech conversion

model

enum<string>

default:sona_speech_1

The model type to use for the text-to-speech conversion

Available options:

sona_speech_1,

sona_speech_2,

sona_speech_2_flash,

supertonic_api_1,

supertonic_api_3

output_format

enum<string>

default:wav

The desired output format of the audio file (wav, mp3). Default is wav.

Available options:

wav,

mp3

voice_settings

object

include_phonemes

boolean

default:false

Return phoneme timing data with the audio

normalized_text

string

Pre-normalized text for TTS. Only used with sona_speech_2 and sona_speech_2_flash models.

Response

Returns either binary audio or JSON with phoneme data based on include_phonemes parameter

Binary audio file (when include_phonemes=false or omitted)

Create cloned voice

Stream speech

⌘I

URL: https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech

⇱ Create speech - Supertone API Documentation

Documentation Index

Endpoint

Path parameters

Request body

Supported languages by model

Voice settings

Voice settings by model

Response

Notes

See also

Docs: Create speech

Stream speech

Authorizations

Path Parameters

Body

Response

URL: https://docs.supertoneapi.com/en/api-reference/endpoints/text-to-speech

⇱ Create speech - Supertone API Documentation

Documentation Index

​Endpoint

​Path parameters

​Request body

​Supported languages by model

​Voice settings

​Voice settings by model

​Response

​Notes

​See also

Docs: Create speech

Stream speech

Authorizations

Path Parameters

Body

Response

Endpoint

Path parameters

Request body

Supported languages by model

Voice settings

Voice settings by model

Response

Notes

See also