Whisper
Last updated
Last updated
The Whisper plugin lets you transcribe audio files using OpenAI's whisper model.
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
Go to Settings > Plugins > Whisper. Select tab Settings
then enter your OpenAI API key.
To use the Whisper plugin, make sure you've setup the API key. Then:
Start a new chat. Choose an LLM that supports Function Calling (for example GPT-4o)
Enable the Whisper plugin
Drag the audio file to the chat input field (not the chat list) and tell the LLM to transcribe it
The audio file input is limited at max 25 MB. You may want to downsample the file before sending for transcription.
To do it, enable the ffmpeg
plugin to downsample the audio file before sending to OpenAI server.
Can I use this offline? No. This plugin uses the OpenAI API and requires Internet connection and a paid OpenAI API account.
Which whisper model does it use?
The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. [source]
To use the better large-v3
model, please use the Whisper via Groq plugin.
What are the limitations?
File uploads are currently limited to 25 MB and the following input file types are supported: mp3
, mp4
, mpeg
, mpga
, m4a
, wav
, and webm
. [source]