Multimodal

OCR

OCR extracts text from images or PDFs.

Quick definition

OCR extracts text from images or PDFs.

It enables text search and analysis. In multimodal workflows, ocr often shapes cross-modal understanding.

Multimodal models align text, vision, and audio signals so one system can reason across modalities.

Multimodal features unlock workflows across text, audio, and images.

Read text from a screenshot.

Noisy inputs lead to unreliable results. Provide clear images, clean audio, and explicit instructions.

In BoltAI, this appears when working with audio, images, or voice.