Whisper

Whisper is OpenAI's automatic speech recognition (ASR) model, trained on 680,000 hours of multilingual, multitask audio.

Capabilities

Transcription: converts audio to text in the same language
Translation: transcribes and translates directly to English
Language identification: automatically detects the audio language
99 language support including Spanish, English, French, German, etc.

API usage

from openai import OpenAI
client = OpenAI()

with open("audio.mp3", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        language="en",
    )
print(transcript.text)

Local usage (open-source)

pip install openai-whisper
whisper audio.mp3 --language English --model large-v3

Whisper documentation