🐸TTS is a library for advanced Text-to-Speech generation. It’s built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. 🐸TTS comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

Gitter License PyPI version Covenant Downloads DOI

GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions GithubActions Docs

📰 Subscribe to 🐸Coqui.ai Newsletter

📢 English Voice Samples and SoundCloud playlist

📄 Text-to-Speech paper collection

💬 Where to ask questions#

Please use our dedicated channels for questions and discussion. Help is much more valuable if it’s shared publicly so that more people can benefit from it.



🚨 Bug Reports

GitHub Issue Tracker

🎁 Feature Requests & Ideas

GitHub Issue Tracker

👩‍💻 Usage Questions

Github Discussions

🗯 General Discussion

Github Discussions or Gitter Room

🥇 TTS Performance#

Underlined “TTS*” and “Judy*” are 🐸TTS models


  • High-performance Deep Learning models for Text2Speech tasks.

    • Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).

    • Speaker Encoder to compute speaker embeddings efficiently.

    • Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)

  • Fast and efficient model training.

  • Detailed training logs on the terminal and Tensorboard.

  • Support for Multi-speaker TTS.

  • Efficient, flexible, lightweight but feature complete Trainer API.

  • Released and ready-to-use models.

  • Tools to curate Text2Speech datasets underdataset_analysis.

  • Utilities to use and test your models.

  • Modular (but not too much) code base enabling easy implementation of new ideas.

Implemented Models#


End-to-End Models#

Attention Methods#

  • Guided Attention: paper

  • Forward Backward Decoding: paper

  • Graves Attention: paper

  • Double Decoder Consistency: blog

  • Dynamic Convolutional Attention: paper

  • Alignment Network: paper

Speaker Encoder#


You can also help us implement more models.

Install TTS#

🐸TTS is tested on Ubuntu 18.04 with python >= 3.7, < 3.11..

If you are only interested in synthesizing speech with the released 🐸TTS models, installing from PyPI is the easiest option.

pip install TTS

If you plan to code or train models, clone 🐸TTS and install it locally.

git clone https://github.com/coqui-ai/TTS
pip install -e .[all,dev,notebooks]  # Select the relevant extras

If you are on Ubuntu (Debian), you can also run following commands for installation.

$ make system-deps  # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
$ make install

If you are on Windows, 👑@GuyPaddock wrote installation instructions here.

Use TTS#

Single Speaker Models#

  • List provided models:

    $ tts --list_models
  • Get model info (for both tts_models and vocoder_models):

    • Query by type/name: The model_info_by_name uses the name as it from the –list_models.

      $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"

      For example:

      $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
      $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
    • Query by type/idx: The model_query_idx uses the corresponding idx from –list_models.

      $ tts --model_info_by_idx "<model_type>/<model_query_idx>"

      For example:

      $ tts --model_info_by_idx tts_models/3 
  • Run TTS with default models:

    $ tts --text "Text for TTS" --out_path output/path/speech.wav
  • Run a TTS model with its default vocoder model:

    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

    For example:

    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav
  • Run with specific TTS and vocoder models from the list:

    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav

    For example:

    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav
  • Run your own TTS model (Using Griffin-Lim Vocoder):

    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
  • Run your own TTS and Vocoder models:

    $ tts --text "Text for TTS" --model_path path/to/config.json --config_path path/to/model.pth --out_path output/path/speech.wav
        --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json

Multi-speaker Models#

  • List the available speakers and choose as <speaker_id> among them:

    $ tts --model_name "<language>/<dataset>/<model_name>"  --list_speaker_idxs
  • Run the multi-speaker TTS model with the target speaker ID:

    $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>"  --speaker_idx <speaker_id>
  • Run your own multi-speaker TTS model:

    $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/config.json --config_path path/to/model.pth --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>

Directory Structure#

|- notebooks/       (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
|- utils/           (common utilities.)
|- TTS
    |- bin/             (folder for all the executables.)
      |- train*.py                  (train your target model.)
      |- distribute.py              (train your TTS model using Multiple GPUs.)
      |- compute_statistics.py      (compute dataset statistics for normalization.)
      |- ...
    |- tts/             (text to speech models)
        |- layers/          (model layer definitions)
        |- models/          (model definitions)
        |- utils/           (model specific utilities.)
    |- speaker_encoder/ (Speaker Encoder models.)
        |- (same)
    |- vocoder/         (Vocoder models.)
        |- (same)

Documentation Content#