# Whisper

The Whisper plugin lets you transcribe audio files using OpenAI's whisper model.

### What is Whisper?

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

### How to set up the Whisper plugin?

Go to Settings > Plugins > Whisper. Select tab `Settings` then enter your OpenAI API key.

### How to use the Whisper Plugin?

To use the Whisper plugin, make sure you've setup the API key. Then:

1. Start a new chat. Choose an LLM that supports Function Calling (for example GPT-4o)
2. Enable the Whisper plugin
3. Drag the audio file to the chat input field (not the chat list) and tell the LLM to transcribe it

{% hint style="info" %}
The audio file input is limited at max 25 MB. You may want to downsample the file before sending for transcription.

To do it, enable the `ffmpeg` plugin to downsample the audio file before sending to OpenAI server.
{% endhint %}

<figure><img src="https://3493584844-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FynYW2xZqA52spY7XgWis%2Fuploads%2FOuujCIYpoaJ9ctYH2ByP%2FCleanShot%202024-06-22%20at%2021.07.45%402x.jpg?alt=media&#x26;token=190b737c-2910-418a-91e3-5a58ad001494" alt=""><figcaption></figcaption></figure>

### FAQ

1. **Can I use this offline?**\
   No. This plugin uses the OpenAI API and requires Internet connection and a paid OpenAI API account.
2. **Which whisper model does it use?**\
   The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. \[[source](https://platform.openai.com/docs/guides/speech-to-text)]\
   To use the better `large-v3` model, please use the [Whisper via Groq](https://docs.boltai.com/docs/plugins/whisper-groq) plugin.
3. **What are the limitations?**\
   File uploads are currently limited to 25 MB and the following input file types are supported: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, and `webm`. \[[source](https://platform.openai.com/docs/guides/speech-to-text)]
