Advanced Voice Mode, Improved Document Analysis and more
BoltAI November Update
First of all, Happy Thanksgiving! I wish you and your family a wonderful day filled with joy, gratitude, and delicious food.
The last couple of month has been productive for me. I’ve shipped a few useful features that you will love.
Let’s dive in!
TL;DR:
Advanced Voice Mode & better voice UIs
New in Document Analysis: native PDF capabilities, OCR support and more
Improved AI Command & better keyboard handling
Improved Inline Whisper with local inference support
Advanced Voice Mode
The most notable change since the last update is the new Advanced Voice Mode (AVM). It works similarly to the AVM in the official ChatGPT app. BoltAI utilizes OpenAI’s Realtime API to let you have real-time voice conversation with the GPT 4o model.
BoltAI supports both OpenAI and Azure OpenAI Service.
Better Voice UIs
I've made a few of improvements to the Voice Input UI:
Better audio visualization
Supports custom keyboard shortcut
Added the ability to cancel the Voice Input session
Retry if the transcription fails
The Read Aloud feature now streams and plays audio in real time. It's much faster and more robust.
Improved Document Analysis
Document Analysis with o1 model family
The OpenAI's new o1 model doesn't support custom System Instruction. In BoltAI, I put your custom system instruction and document content into the user prompt. Enjoy!
Claude's native PDF capabilities
If you're using Claude Sonnet 3.5, you can use the new native PDF capabilities in BoltAI. Find the option in the right sidebar (screenshot below).
Note that when native PDF capabilities option is enabled, Prompt Caching will be temporarily disabled.
OCR support
For PDFs with images and tables, you can "reprocess" the document using OCR for better accuracy. Currently, BoltAI only support OCR for PDF documents.
Improved AI Command
I listented to your feedback and added a few improvements to the AI Command feature:
New Behavior: Open in a new temporary window
When using an AI Command, you can choose to open it in a new temporary window. No need to clean up inline chats aftereward.
Improved performance & keyboard handling
You can now use control+n or control+p to navigate.
Improved input handling for Japanese input source
Improved AI Command performance, faster search & scrolling
Better Inline Whisper
The Inline Whisper feature allows you to use Whisper transcription within any textfield. Press a shortcut keyboard, speak and press the same keyboard shortcut again.
Now you can customize it even more: use a different AI provider, custom prompt or copy transcription to clipboard...
To use a local Whisper instance, follow this guide.
And that's it for now 👋
Happy Thanksgiving!
Last updated