Red Hat and AWS Target Optimized Inference in the Cloud

Red Hat and Amazon Web Services (AWS) have announced an expansion of their collaboration, focusing on facilitating enterprise-grade generative AI deployments on AWS infrastructure. The initiative aims to provide IT decision-makers with enhanced flexibility for efficient, high-performance AI inference at scale, independent of the underlying hardware architecture.

The accelerating demand for scalable inference driven by generative AI is prompting organizations to reassess their foundational IT infrastructure. Industry analysis supports this focus on optimized hardware, with IDC projecting that 40% of organizations will utilize custom silicon, including specialized AI/ML chips, by 2027 to address rising performance, cost, and specialization requirements. The collaboration between Red Hat and AWS is structured to address this need by integrating Red Hat's platform capabilities with AWS cloud infrastructure and purpose-built AI chipsets, specifically AWS Inferentia2 and AWS Trainium3.

A key component is the enablement of Red Hat AI Inference Server on AWS AI silicon. This server, powered by vLLM, is being adapted to run with AWS Inferentia2 and AWS Trainium3, offering a consistent inference layer capable of supporting various generative AI models. This integration is designed to help customers achieve lower latency and enhanced price performance, with projections indicating a potential 30-40% improvement compared to current comparable GPU-based Amazon EC2 instances for scaling production AI.

To streamline AI operations, Red Hat has cooperated with AWS to develop an AWS Neuron operator for Red Hat OpenShift, Red Hat OpenShift AI, and Red Hat OpenShift Service on AWS. This operator provides a supported mechanism for customers to run their AI workloads utilizing AWS accelerators on the fully managed application platform. The expanded support for AWS AI chips is intended to provide Red Hat customers on AWS with improved access to these high-demand accelerators. Furthermore, Red Hat has released the amazon.ai Certified Ansible Collection for Red Hat Ansible Automation Platform, designed to enable the orchestration of AI services within AWS environments.

Red Hat and AWS are also contributing to the open-source community by collaborating on the optimization and upstreaming of an AWS AI chip plugin to vLLM. Red Hat, a major commercial contributor to vLLM, is supporting vLLM enablement on AWS to accelerate both AI inference and training capabilities for users. vLLM forms the basis of llm-d, an open-source project focused on scalable inference, which is now available as a commercially supported feature in Red Hat OpenShift AI 3.

This expanded collaboration builds upon a sustained relationship between Red Hat and AWS, now focusing on the evolving needs of organizations integrating AI into their hybrid cloud strategies for optimized and efficient generative AI outcomes.

The AWS Neuron community operator is currently accessible via the Red Hat OpenShift OperatorHub. Developer preview availability for Red Hat AI Inference Server support for AWS AI chips is anticipated in January 2026.

Sign up for AI-360