openai_text2speech#

class baf.nlp.text2speech.openai_text2speech.OpenAIText2Speech(agent, model_name, voice='alloy', language=None)[source]#

Bases: Text2Speech

An OpenAI Text2Speech.

Implements the OpenAI Create Speech API.

Parameters:

nlp_engine (NLPEngine) – the NLPEngine that handles the NLP processes of the agent

_model_name#

The Hugging Face model name

Type:

str

_voice#

The voice to use when generating the audio

Type:

str

_abc_impl = <_abc._abc_data object>#
text2speech(text)[source]#

Synthesize a text into its corresponding audio speech signal.

Parameters:

text (str) – the text that wants to be synthesized

Returns:

the speech synthesis as a dictionary containing 2 keys:
audio (np.ndarray): the generated audio waveform as a numpy array with dimensions (nb_channels, audio_length),

where nb_channels is the number of audio channels (usually 1 for mono) and audio_length is the number of samples in the audio

sampling_rate (int): an integer value containing the sampling rate, e.g. how many samples correspond to

one second of audio

Return type:

dict