India-based AI startup, Sarvam AI launched an advanced multimodal AI model dubbed Sarvam Vision. This model comes with document intelligence, Optical Character Recognition (OCR), and visual language understanding across India’s diverse languages and scripts. This new AI model surpasses Gemini 3 Pro, GPT 5.2, and other AI models when it comes to document intelligence. The Sarvam AI press note states that the frontier Vision Language Models are built for processing modern English documents.
“Much of India's knowledge remains embedded in physical documents, scanned archives, and historical collections. This is knowledge locked in plain sight,” the press note added. “Unlocking this material is essential for long-term preservation, access, and reuse across research, governance, and enterprise workflows.”
Sarvam Vision is backed by the company's in-house 3B-parameter state-space vision-language model, which claims to deliver high-fidelity text extraction and semantic understanding, even in documents with mixed content.
In the early benchmark tests, the AI model outperformed leading AI models on OCR tasks in 22 official Indian languages, including Hindi, Bengali, Tamil, Telugu, Marathi, Malayalam, Kannada, Gujarati, Punjabi, Urdu, Assamese, and more.
According to Sarvam AI, Sarvam Vision was trained using advanced techniques to improve accuracy, reliability, and understanding across text and visuals. Benchmark results show the model performs competitively with global AI systems and outperforms many of them on Indic OCR tasks.
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.



