Datasets

TTS Dataset

class TTS.tts.datasets.TTSDataset(outputs_per_step, text_cleaner, compute_linear_spec, ap, meta_data, compute_f0=False, f0_cache_path=None, characters=None, custom_symbols=None, add_blank=False, return_wav=False, batch_group_size=0, min_seq_len=0, max_seq_len=inf, use_phonemes=False, phoneme_cache_path=None, phoneme_language='en-us', enable_eos_bos=False, speaker_id_mapping=None, d_vector_mapping=None, use_noise_augment=False, verbose=False)[source]
collate_fn(batch)[source]

Perform preprocessing and create a final data batch: 1. Sort batch instances by text-length 2. Convert Audio signal to features. 3. PAD sequences wrt r. 4. Load to Torch.

compute_input_seq(num_workers=0)[source]

Compute the input sequences with multi-processing. Call it before passing dataset to the data loader to cache the input sequences for faster data loading.

sort_and_filter_items(by_audio_len=False)[source]

Sort items based on text length or audio length in ascending order. Filter out samples out or the length range.

Parameters

by_audio_len (bool) – if True, sort by audio length else by text length.

Vocoder Dataset

class TTS.vocoder.datasets.gan_dataset.GANDataset(ap, items, seq_len, hop_len, pad_short, conv_pad=2, return_pairs=False, is_training=True, return_segments=True, use_noise_augment=False, use_cache=False, verbose=False)[source]

GAN Dataset searchs for all the wav files under root path and converts them to acoustic features on the fly and returns random segments of (audio, feature) couples.

load_item(idx)[source]

load (audio, feat) couple

class TTS.vocoder.datasets.wavegrad_dataset.WaveGradDataset(ap, items, seq_len, hop_len, pad_short, conv_pad=2, is_training=True, return_segments=True, use_noise_augment=False, use_cache=False, verbose=False)[source]

WaveGrad Dataset searchs for all the wav files under root path and converts them to acoustic features on the fly and returns random segments of (audio, feature) couples.

static collate_full_clips(batch)[source]

This is used in tune_wavegrad.py. It pads sequences to the max length.

load_item(idx)[source]

load (audio, feat) couple

load_test_samples(num_samples)[source]

Return test samples.

Parameters

num_samples (int) – Number of samples to return.

Returns

melspectorgram and audio.

Return type

List[Tuple]

Shapes

  • melspectrogram (Tensor): \([C, T]\)

  • audio (Tensor): \([T_audio]\)

class TTS.vocoder.datasets.wavernn_dataset.WaveRNNDataset(ap, items, seq_len, hop_len, pad, mode, mulaw, is_training=True, verbose=False, return_segments=True)[source]

WaveRNN Dataset searchs for all the wav files under root path and converts them to acoustic features on the fly.

load_item(index)[source]

load (audio, feat) couple if feature_path is set else compute it on the fly