vrishbhanu28
- 1 min read

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

In this project, we developed a one-shot multi-speaker text-to-speech system using a novel transformer architecture, where we incorporate scaled speaker embeddings at different stages of the transformer. This enables us to synthesize speech in the voice of any target speaker, given only a 5-second clip of their voice. You can watch the presentation video below and access the colab notebook [here] if you are interested in the code.

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

Recent Posts

Kommentare

Contact
Information

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

Recent Posts

Kommentare

Contact Information

Contact
Information