Adding defined period pauses to the input text file
Is there a way to add pauses to the text file to that TTS will pause for "x" number of seconds before continuing with the next sentence? Currently I need to create separate text files and merge the output TTS manually with a "x" second pause.
i second this. would be awesome:
- to have the ability to add pauses in a format like [BREAK=3] seconds or something like that
- control delay between sentences. now its too fast
- add other tags for filler words, etc
I had same issue and my approach was to handle it manually. This approach is available using both torch tensors or numpy arrays, but I think the syntaxis may change a little bit:
- I generate the speech segment. Kokoro returns them as numpy arrays but I convert them into torch tensors. It isn't necessary to do this conversion.
- I manually create a silence. As Kokoro's audios sample rate is 24000 Hz, to generate a silence of 3 seconds, it could be done as
silence = torch.zeros(1, 3*24000)
. Using numpy arrays is the same but np.zeros instead of torch.zeros. - Then I continue generating all the different segments that I need and all this segments, both speech and silence, are appended to a python list.
- When I finish with the generation I concatenate all segments with torch.cat() or np.concatenate().
However, it would be very amazing to have like a list of special tokens to perform this kind of things with the model itself. Not only pauses but also laughts, emotion, etc.
Anyways, I hope this is useful to perform the task you are commenting :)
I found when using a series of punctuation marks it inserts a pause. However if you do too many, there is a weird noice like a breath but very unnatural sounding for the voice I was using. Insert the following in your text.
, . , . , . , .
I implemented defined period pause functionality here: https://github.com/vijay120/kokoro-tts?tab=readme-ov-file#input-file-formatting
You just have to provide input file like this:
Welcome to the presentation
PAUSE_2.5
This text comes after a 2.5 second pause
PAUSE_10
And this comes after a 10 second pause
And it will output an audio file with the appropriate pauses between the audio for the defined number of seconds. Let me know if this works for y'all.