Model API#

Model API provides you a set of functions that easily make your model compatible with the Trainer, Synthesizer and ModelZoo.

Base TTS Model#

Base tts Model#

class TTS.tts.models.base_tts.BaseTTS(config, ap, tokenizer, speaker_manager=None, language_manager=None)[source]#

Base tts class. Every new tts model must inherit this.

It defines common tts specific functions on top of Model implementation.


Generic batch formatting for TTSDataset.

You must override this if you use a custom dataset.


batch (Dict) – [description]



Return type



Prepare and return aux_input used by forward()

init_multispeaker(config, data=None)[source]#

Initialize a speaker embedding layer if needen and define expected embedding channel size for defining in_channels size of the connected layers.

This implementation yields 3 possible outcomes:

  1. If config.use_speaker_embedding and `config.use_d_vector_file are False, do nothing.

  2. If config.use_d_vector_file is True, set expected embedding channel size to config.d_vector_dim or 512.

3. If config.use_speaker_embedding, initialize a speaker embedding layer with channel size of config.d_vector_dim or 512.

You can override this function for new models.


config (Coqpit) – Model configuration.


Save the speaker.pth and language_ids.json at the beginning of the training. Also update both paths.


Generic test run for tts models used by Trainer.

You can override this for a different behaviour.


assets (dict) – A dict of training assets. For tts models, it must include {‘audio_processor’: ap}.


Test figures and audios to be projected to Tensorboard.

Return type

Tuple[Dict, Dict]

Base vocoder Model#

class TTS.vocoder.models.base_vocoder.BaseVocoder(config)[source]#

Base vocoder class. Every new vocoder model must inherit this.

It defines vocoder specific functions on top of Model.

Notes on input/output tensor shapes:

Any input or output tensor of the model must be shaped as

  • 3D tensors batch x time x channels

  • 2D tensors batch x channels

  • 1D tensors batch x 1