Microsoft just became the first hyperscale cloud provider to power on NVIDIA's next-generation Vera Rubin NVL72 systems, marking a significant infrastructure leap as enterprise AI shifts from training to inference-heavy workloads. Announced at NVIDIA GTC, the partnership expansion brings Microsoft Foundry Agent Service to general availability, adds NVIDIA Nemotron models to Azure, and deepens Physical AI integration between Microsoft Fabric and NVIDIA Omniverse. The moves signal Microsoft's bet that production AI agents and reasoning-based workloads will reshape cloud infrastructure demands.
Microsoft and NVIDIA are reshuffling the enterprise AI infrastructure deck. At NVIDIA's GTC conference, Microsoft unveiled it's become the first hyperscale cloud provider to power on NVIDIA's newest Vera Rubin NVL72 systems in its labs - a significant move as AI workloads shift from training large models to running inference-heavy, reasoning-based agents at scale.
The announcement extends a years-long hardware partnership that's taken on new urgency. According to Microsoft's official blog post, the company has deployed "hundreds of thousands of liquid-cooled Grace Blackwell GPUs" across its global datacenter footprint in less than a year. Now Vera Rubin NVL72 will roll out to Azure's modern, liquid-cooled datacenters over the next few months.
That infrastructure build-out isn't just about raw compute. It's designed to support a different type of AI workload - one where agents reason, plan, and act across enterprise tools and data rather than simply responding to prompts. Microsoft is banking that this shift will require purpose-built systems optimized for inference rather than training.












