hf_text2speech#
- class baf.nlp.text2speech.hf_text2speech.HFText2Speech(agent, model_name, language=None)[source]#
Bases:
Text2SpeechA Hugging Face Text2Speech.
It loads a Speech2Text Hugging Face model to perform the Speech2Text task.
- Parameters:
- _tts#
The Transformer Text-to-Speech Pipeline
- _tokenizer#
The Vits Tokenizer. Also supports MMS-TTS.
- _model#
The complete VITS model
- _abc_impl = <_abc._abc_data object>#
- text2speech(text, return_tensor='pt')[source]#
Synthesize a text into its corresponding audio speech signal.
- Parameters:
text (str) – the text that wants to be synthesized
return_tensor (str, optional) – Property for the HFText2Speech agent component. If set, will return tensors instead of list of python integers. Acceptable values are:
'tf' – Return TensorFlow tf.constant objects.
'pt' – Return PyTorch torch.Tensor objects.
'np' – Return Numpy np.ndarray objects.
name –
nlp.text2speech.hf.rttype –
strvalue (default) –
pt
- Returns:
- the speech synthesis as a dictionary containing 2 keys:
- audio (np.ndarray): the generated audio waveform as a numpy array with dimensions (nb_channels, audio_length),
where nb_channels is the number of audio channels (usually 1 for mono) and audio_length is the number of samples in the audio
- sampling_rate (int): an integer value containing the sampling rate, e.g. how many samples correspond to
one second of audio
- Return type: