top of page
  • Writer's picturevrishbhanu28

One shot Multi speaker text to speech transformer using pretrained scaled speaker embeddings

In this project, we developed a one-shot multi-speaker text-to-speech system using a novel transformer architecture, where we incorporate scaled speaker embeddings at different stages of the transformer. This enables us to synthesize speech in the voice of any target speaker, given only a 5-second clip of their voice. You can watch the presentation video below and access the colab notebook [here] if you are interested in the code.

14 views0 comments


bottom of page