Speech-to-text AI Examples

Use OpenAI Whisper API from Huggingface Inference Endpoints to transcribe speech to text.


Generative AI models offer capabilities for transcribing audio to text at an incredibly great accuracy, stepping in to field where traditionally niche players offered pricey services. Medical and legal fields are in a good position to leverage these new capabilities.

This example shows two simple ways of transcribing an audio file to text:


The example provides a small flac and m4a source file, and uses Robocorp Control Room's Vault for storing the access credentials. These are the names of required Vaults and keys for each use case:

  • Huggingface Inference Endpoints
    • Vault named Huggingface
    • Key named whisper-url that has the URL of a deployed inference endpoint (which you need to create)
    • Key named api-token has the API token from your HF account
  • OpenAI

    • Vault named OpenAI
    • Key named key that has the API key

