Speaker Manager API¶

The TTS.tts.utils.speakers.SpeakerManager organize speaker related data and information for 🐸TTS models. It is especially useful for multi-speaker models.

Speaker Manager¶

class TTS.tts.utils.speakers.SpeakerManager(data_items=None, d_vectors_file_path='', speaker_id_file_path='', encoder_model_path='', encoder_config_path='', use_cuda=False)[source]¶

Manage the speakers for multi-speaker 🐸TTS models. Load a datafile and parse the information in a way that can be queried by speaker or clip.

There are 3 different scenarios considered:

  1. Models using speaker embedding layers. The datafile only maps speaker names to ids used by the embedding layer.

  2. Models using d-vectors. The datafile includes a dictionary in the following format.

{
    'clip_name.wav':{
        'name': 'speakerA',
        'embedding'[<d_vector_values>]
    },
    ...
}

3. Computing the d-vectors by the speaker encoder. It loads the speaker encoder model and computes the d-vectors for a given clip or speaker.

Parameters
  • d_vectors_file_path (str, optional) – Path to the metafile including x vectors. Defaults to “”.

  • speaker_id_file_path (str, optional) – Path to the metafile that maps speaker names to ids used by

  • models. Defaults to "". (TTS) –

  • encoder_model_path (str, optional) – Path to the speaker encoder model file. Defaults to “”.

  • encoder_config_path (str, optional) – Path to the spealer encoder config file. Defaults to “”.

Examples

>>> # load audio processor and speaker encoder
>>> ap = AudioProcessor(**config.audio)
>>> manager = SpeakerManager(encoder_model_path=encoder_model_path, encoder_config_path=encoder_config_path)
>>> # load a sample audio and compute embedding
>>> waveform = ap.load_wav(sample_wav_path)
>>> mel = ap.melspectrogram(waveform)
>>> d_vector = manager.compute_d_vector(mel.T)
compute_d_vector(feats)[source]¶

Compute d_vector from features.

Parameters

feats (Union[torch.Tensor, np.ndarray]) – Input features.

Returns

computed d_vector.

Return type

List

compute_d_vector_from_clip(wav_file)[source]¶

Compute a d_vector from a given audio file.

Parameters

wav_file (Union[str, list]) – Target file path.

Returns

Computed d_vector.

Return type

list

property d_vector_dim¶

Dimensionality of d_vectors. If d_vectors are not loaded, returns zero.

get_d_vector_by_clip(clip_idx)[source]¶

Get d_vector by clip ID.

Parameters

clip_idx (str) – Target clip ID.

Returns

d_vector as a list.

Return type

List

get_d_vectors_by_speaker(speaker_idx)[source]¶

Get all d_vectors of a speaker.

Parameters

speaker_idx (str) – Target speaker ID.

Returns

all the d_vectors of the given speaker.

Return type

List[List]

get_mean_d_vector(speaker_idx, num_samples=None, randomize=False)[source]¶

Get mean d_vector of a speaker ID.

Parameters
  • speaker_idx (str) – Target speaker ID.

  • num_samples (int, optional) – Number of samples to be averaged. Defaults to None.

  • randomize (bool, optional) – Pick random num_samples of d_vectors. Defaults to False.

Returns

Mean d_vector.

Return type

np.ndarray

init_speaker_encoder(model_path, config_path)[source]¶

Initialize a speaker encoder model.

Parameters
  • model_path (str) – Model file path.

  • config_path (str) – Model config file path.

static parse_speakers_from_data(items)[source]¶

Parse speaker IDs from data samples retured by load_meta_data().

Parameters

items (list) – Data sampled returned by load_meta_data().

Returns

speaker IDs and number of speakers.

Return type

Tuple[Dict, int]

save_d_vectors_to_file(file_path)[source]¶

Save d_vectors to a json file.

Parameters

file_path (str) – Path to the output file.

save_speaker_ids_to_file(file_path)[source]¶

Save speaker IDs to a json file.

Parameters

file_path (str) – Path to the output file.

set_d_vectors_from_file(file_path)[source]¶

Load d_vectors from a json file.

Parameters

file_path (str) – Path to the target json file.

set_speaker_ids_from_data(items)[source]¶

Set speaker IDs from data samples.

Parameters

items (List) – Data sampled returned by load_meta_data().

set_speaker_ids_from_file(file_path)[source]¶

Set speaker IDs from a file.

Parameters

file_path (str) – Path to the file.

TTS.tts.utils.speakers.get_speaker_manager(c, data=None, restore_path=None, out_path=None)[source]¶

Initiate a SpeakerManager instance by the provided config.

Parameters
  • c (Coqpit) – Model configuration.

  • restore_path (str) – Path to a previous training folder.

  • data (List) – Data samples used in training to infer speakers from. It must be provided if speaker embedding layers is used. Defaults to None.

  • out_path (str, optional) – Save the generated speaker IDs to a output path. Defaults to None.

Returns

initialized and ready to use instance.

Return type

SpeakerManager

TTS.tts.utils.speakers.load_speaker_mapping(out_path)[source]¶

Loads speaker mapping if already present.

TTS.tts.utils.speakers.save_speaker_mapping(out_path, speaker_mapping)[source]¶

Saves speaker mapping if not yet present.