Adding defined period pauses to the input text file

#61

by vijay120 - opened 19 days ago

19 days ago

Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.

alpacaxz

18 days ago

i second this. would be awesome:

to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
control delay between sentences. now its too fast
add other tags for filler words, etc

cgarijo-Ci21

17 days ago

I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:

I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
silence = torch.zeros(1, 3*24000). Using numpy arrays is the same but np.zeros instead of torch.zeros.
Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().

However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.

Anyways, I hope this is useful to perform the task you are commenting :)

Philp

2 days ago

I found when using a series of punctuation marks it inserts a pause. However if you do too many, there is a weird noice like a breath but very unnatural sounding for the voice I was using. Insert the following in your text.
, . , . , . , .

vijay120

about 15 hours ago

I implemented defined period pause functionality here: https://github.com/vijay120/kokoro-tts?tab=readme-ov-file#input-file-formatting

You just have to provide input file like this:

Welcome to the presentation
PAUSE_2.5
This text comes after a 2.5 second pause
PAUSE_10
And this comes after a 10 second pause

And it will output an audio file with the appropriate pauses between the audio for the defined number of seconds. Let me know if this works for y'all.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment