Advanced Voice Mode, Improved Document Analysis and more

BoltAI November Update

First of all, Happy Thanksgiving! I wish you and your family a wonderful day filled with joy, gratitude, and delicious food.

The last couple of month has been productive for me. I’ve shipped a few useful features that you will love.

Let’s dive in!

TL;DR:

  • Advanced Voice Mode & better voice UIs

  • New in Document Analysis: native PDF capabilities, OCR support and more

  • Improved AI Command & better keyboard handling

  • Improved Inline Whisper with local inference support

Advanced Voice Mode

The most notable change since the last update is the new Advanced Voice Mode (AVM). It works similarly to the AVM in the official ChatGPT app. BoltAI utilizes OpenAI’s Realtime API to let you have real-time voice conversation with the GPT 4o model.

BoltAI supports both OpenAI and Azure OpenAI Service.

Read more…

Better Voice UIs

I've made a few of improvements to the Voice Input UI:

  • Better audio visualization

  • Supports custom keyboard shortcut

  • Added the ability to cancel the Voice Input session

  • Retry if the transcription fails

The Read Aloud feature now streams and plays audio in real time. It's much faster and more robust.

Improved Document Analysis

Document Analysis with o1 model family

The OpenAI's new o1 model doesn't support custom System Instruction. In BoltAI, I put your custom system instruction and document content into the user prompt. Enjoy!

Claude's native PDF capabilities

If you're using Claude Sonnet 3.5, you can use the new native PDF capabilities in BoltAI. Find the option in the right sidebar (screenshot below).

Note that when native PDF capabilities option is enabled, Prompt Caching will be temporarily disabled.

OCR support

For PDFs with images and tables, you can "reprocess" the document using OCR for better accuracy. Currently, BoltAI only support OCR for PDF documents.

Read more...

Improved AI Command

I listented to your feedback and added a few improvements to the AI Command feature:

New Behavior: Open in a new temporary window

When using an AI Command, you can choose to open it in a new temporary window. No need to clean up inline chats aftereward.

Improved performance & keyboard handling

  • You can now use control+n or control+p to navigate.

  • Improved input handling for Japanese input source

  • Improved AI Command performance, faster search & scrolling

Better Inline Whisper

The Inline Whisper feature allows you to use Whisper transcription within any textfield. Press a shortcut keyboard, speak and press the same keyboard shortcut again.

Now you can customize it even more: use a different AI provider, custom prompt or copy transcription to clipboard...

To use a local Whisper instance, follow this guide.

And that's it for now 👋

Happy Thanksgiving!

Last updated