Viewing a single comment thread. View all comments

vardonir t1_it7wwgq wrote

im trying to find models for voice cloning but all i seem to find are for text-to-speech. there has to be something that exists that changes the voice audio-to-audio, right? the intonation/emotion of the speaker in the source audio will be lost if it goes through TTS, so i dont want it to go through that

(it doesnt have to be real-time)

1