ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip. Built on Tortoise, ⓍTTS has important model changes that make cross-language voice cloning and multi-lingual speech generation super easy. There is no need for an excessive amount of training data that spans countless hours.

Features

  • Supports 13 languages.
  • Voice cloning with just a 3-second audio clip.
  • Emotion and style transfer by cloning.
  • Cross-language voice cloning.
  • Pre-Trained speakers.
  • Multi-lingual speech generation.
  • 24khz sampling rate.

As of now, XTTS-v2 supports 13 languages:

  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Polish (pl)
  • Turkish (tr)
  • Russian (ru)
  • Dutch (nl)
  • Czech (cs)
  • Arabic (ar)
  • Chinese (zh-cn)

Original Repository: coqui-ai/TTS