Model API#
Model API provides you a set of functions that easily make your model compatible with the Trainer
,
Synthesizer
and ModelZoo
.
Base TTS Model#
Base tts
Model#
- class TTS.tts.models.base_tts.BaseTTS(config, ap, tokenizer, speaker_manager=None, language_manager=None)[source]#
Base tts class. Every new tts model must inherit this.
It defines common tts specific functions on top of Model implementation.
- format_batch(batch)[source]#
Generic batch formatting for TTSDataset.
You must override this if you use a custom dataset.
- Parameters
batch (Dict) – [description]
- Returns
[description]
- Return type
Dict
- init_multispeaker(config, data=None)[source]#
Initialize a speaker embedding layer if needen and define expected embedding channel size for defining in_channels size of the connected layers.
This implementation yields 3 possible outcomes:
If config.use_speaker_embedding and `config.use_d_vector_file are False, do nothing.
If config.use_d_vector_file is True, set expected embedding channel size to config.d_vector_dim or 512.
3. If config.use_speaker_embedding, initialize a speaker embedding layer with channel size of config.d_vector_dim or 512.
You can override this function for new models.
- Parameters
config (Coqpit) – Model configuration.
- on_init_start(trainer)[source]#
Save the speaker.pth and language_ids.json at the beginning of the training. Also update both paths.
- test_run(assets)[source]#
Generic test run for tts models used by Trainer.
You can override this for a different behaviour.
- Parameters
assets (dict) – A dict of training assets. For tts models, it must include {‘audio_processor’: ap}.
- Returns
Test figures and audios to be projected to Tensorboard.
- Return type
Tuple[Dict, Dict]
Base vocoder
Model#
- class TTS.vocoder.models.base_vocoder.BaseVocoder(config)[source]#
Base vocoder class. Every new vocoder model must inherit this.
It defines vocoder specific functions on top of Model.
- Notes on input/output tensor shapes:
Any input or output tensor of the model must be shaped as
3D tensors batch x time x channels
2D tensors batch x channels
1D tensors batch x 1