Breaking News

Google unveils faster and cheaper AI model Gemini 3.1 Flash-Lite

The new Gemini 3.1 Flash-Lite model from Google is designed for high-volume developer workloads, offering faster response speeds and lower operating costs while being accessible through developer platforms and enterprise AI infrastructure tools.

Google has introduced Gemini 3.1 Flash-Lite, a new artificial intelligence model positioned as the fastest and most cost-efficient offering within its Gemini 3 family. The company said the model is designed primarily for developers and enterprise users handling large-scale AI workloads.

Currently available in preview, the model is not yet accessible to general users. Instead, developers can experiment with it through the Gemini application programming interface in Google AI Studio, while enterprise customers can deploy it through Vertex AI.

According to the company, Gemini 3.1 Flash-Lite is engineered to deliver rapid responses while maintaining efficient computational costs, making it suitable for applications that require processing large volumes of requests.

Faster response speeds and improved performance

Google claims the new model improves significantly over the previous Gemini 2.5 Flash model in terms of speed and performance. Internal benchmarks suggest the model delivers up to 2.5 times faster time-to-first response token and approximately 45 percent higher output speed.

Performance comparisons also indicate that the model delivers faster output generation than several competing lightweight AI systems. The company highlighted its strong ranking on benchmarking leaderboards, where it achieved a high performance score in independent testing.

Developers using the model through AI Studio or Vertex AI will be able to access two operating modes: a standard mode and a “thinking” mode. The latter allows users to adjust the amount of processing time the model spends reasoning through complex tasks before generating an output.

Built for high-volume AI applications

Google said Gemini 3.1 Flash-Lite is particularly suited for tasks requiring large-scale processing, such as translation, content moderation and automated instruction-based workflows. It can also support more complex operations including building user interface elements, creating dashboards and running simulation tasks.

Cost efficiency is another key focus of the model. Google said pricing starts at $0.25 per million input tokens and $1.5 per million output tokens, making it cheaper to run than earlier models in the Gemini lineup.

With its emphasis on speed, affordability and scalability, Gemini 3.1 Flash-Lite is intended to help developers and businesses integrate generative AI into applications that require fast responses and continuous high-volume processing.