hf_speech2text#

class baf.nlp.speech2text.hf_speech2text.HFSpeech2Text(agent, model_name, load_from_pytorch=False, language=None)[source]#

Bases: Speech2Text

A Hugging Face Speech2Text.

It loads a Speech2Text Hugging Face model to perform the Speech2Text task.

Parameters:
  • agent (Agent) – the agent instance using this Speech2Text component

  • model_name (str) – the Hugging Face model name to load

  • load_from_pytorch (bool, optional, defaults to False) – Load the model weights from a PyTorch checkpoint save file

  • language (str, optional) – the language to use for transcription

_from_pt#

Load the model weights from a PyTorch checkpoint save file

Type:

bool, optional, defaults to False

(see docstring of pretrained_model_name_or_path argument).
_model_name#

the Hugging Face model name

Type:

str

_processor#

the model text processor

_model#

the Speech2Text model

_sampling_rate#

the sampling rate of audio data, it must coincide with the sampling rate used to train the model

Type:

int

_forced_decoder_ids#

the decoder ids

Type:

list

_asr#

the transformer ASR pipeline

_abc_impl = <_abc._abc_data object>#
speech2text(speech)[source]#

Transcribe a voice audio into its corresponding text representation.

Parameters:

speech (bytes) – the recorded voice that wants to be transcribed

Returns:

the speech transcription

Return type:

str