Google Debuts Gemini 3.1 Flash-Lite for Cost-Efficient Enterprise AI Workloads

Google has introduced Gemini 3.1 Flash-Lite, a new AI model positioned for high-volume enterprise workloads and available in preview through the Gemini API in Google AI Studio and Vertex AI. The release delivers a lower price per token and accelerated response times compared with previous ‘Flash’ models, reflecting a strategic push toward cost-efficient, scalable AI infrastructure.

The pricing structure — $0.25 per million input tokens and $1.50 per million output tokens — targets applications where token cost and processing speed materially impact operational expenditure, such as bulk translation, automated content moderation, interface generation, and simulation frameworks. Early benchmark data indicate substantial improvements in latency and throughput versus Gemini 2.5 Flash, with maintained or improved output quality across reasoning and multimodal evaluation metrics.

Flash-Lite incorporates adjustable reasoning levels, enabling developers to tune the model’s computational intensity based on task complexity and cost considerations. This flexibility is relevant for enterprises balancing real-time performance with deeper analytic or generation tasks. The model’s performance on standardized reasoning benchmarks suggests it can serve workloads that previously required larger, more expensive models, broadening its applicability in production environments.

The preview release underscores Google’s broader enterprise AI strategy of offering differentiated model tiers, from high-capability Pro variants to lightweight, high-throughput options, through integrated tooling and cloud platforms. By addressing latency, token economics, and deployment flexibility, Gemini 3.1 Flash-Lite aims to help organizations scale AI without proportionally scaling cost or infrastructure complexity.

Sign up for AI-360