1 Comment

Thanks for sharing! I have question about training vocoder with GPT outputs

How did you make GPT output for training vocoder?

GPT model has input like <condition> <text token> <mel token> and final layer output will be used in vocoder training, but how condition is selected?

In XTTS v1 technical report, condition mel was shuffled for training GPT, how condition mel is processed when training vocoder?

Expand full comment