NVIDIA has announced a push to bring partners into a broad expansion of AI computing infrastructure, the company said. The effort centers on what NVIDIA describes as a shift in how AI is being used, with the industry moving away from model development and toward running AI at production scale.

According to NVIDIA, this transition is driving an acceleration in demand for compute resources. AI factories, as NVIDIA calls them, run continuously to generate tokens at scale, a workload pattern that differs from earlier phases of AI development.

NVIDIA said the infrastructure model it envisions is multi-tenant accelerated computing. The company stated that new capacity must come online quickly, maintain high utilization rates, and be structured to support the financial realities of services operating at token scale.

NVIDIA also noted that emerging AI companies have historically had difficulty obtaining access to large-scale compute resources. The company said it is now inviting partners to participate in the buildout of this new generation of AI infrastructure.

The core of NVIDIA's announcement, per the company, is that production inference, not model training, is becoming the primary driver of compute demand, and that the infrastructure to meet it must be built out rapidly.