XTTS v2, Coqui's new version of open-source text-to-speech model.
Thanks for sharing! I have question about training vocoder with GPT outputs
How did you make GPT output for training vocoder?
GPT model has input like <condition> <text token> <mel token> and final layer output will be used in vocoder training, but how condition is selected?
In XTTS v1 technical report, condition mel was shuffled for training GPT, how condition mel is processed when training vocoder?
Thanks for sharing! I have question about training vocoder with GPT outputs
How did you make GPT output for training vocoder?
GPT model has input like <condition> <text token> <mel token> and final layer output will be used in vocoder training, but how condition is selected?
In XTTS v1 technical report, condition mel was shuffled for training GPT, how condition mel is processed when training vocoder?