Dell Brings OpenAI Codex On-Prem as AI Workloads Quit Cloud

Sophie Zhang — Wed, 20 May 2026 02:00:01 +0200

One developer. One billion tokens. Twenty-four hours. A $3,400 cloud bill.

Dell cited exactly that scenario during its Dell Technologies World keynote in Las Vegas this week to explain why enterprises are pulling AI workloads back from the cloud - and why the company spent the last year building hardware and partnerships to catch that traffic. The conference, held May 18-19, produced one of the denser infrastructure product drops in recent memory.

TL;DR

67% of AI workloads already run outside the cloud, per Dell's own survey data
OpenAI Codex, Google Gemini, Palantir Foundry, Mistral, SpaceXAI Grok now available on Dell AI Factory on-prem
PowerRack integrates compute, networking, storage, and cooling into a single turnkey system - live in 6.5 hours from delivery
Deskside Agentic AI puts local model inference on workstations, with up to 87% lower cost than public cloud over two years
Vector indexing on the AI Data Platform is now 12x faster with NVIDIA Blackwell acceleration

The Cloud Bill That Broke the Budget

Dell's survey data, released with the event, found that 67% of AI workloads currently run outside the cloud - on premises, at the edge, at colocation facilities, or on devices. Of the enterprises surveyed, 88% said they already run at least one AI workload on-prem. That's not a fringe preference.

The billing math is straightforward. A developer running a million-context agentic session burns tokens fast. At frontier API rates, one billion tokens in 24 hours costs roughly $3,400. Multiply by a team of 20 developers running Codex-powered agents against a large private codebase, and the monthly cloud invoice becomes a budget line that draws attention.

Data sovereignty adds another layer. Financial services firms in the UK and EU, healthcare organizations, and government agencies have compliance mandates that make sending code and operational data to a third-party cloud API complicated at best and impossible at worst. OpenAI's existing Codex offering is API-only - it requires your data to leave your network.

The Dell-OpenAI partnership announced this week changes that. Codex will connect to the Dell AI Data Platform, which runs in the customer's data center. Internal codebases, documentation, operational knowledge, and system records stay on-prem. The API gateway runs close to the data rather than across a public internet connection.

What Changed in the AI Data Platform

Enterprise AI workloads are flowing back on-premises as data sovereignty and cost pressures mount. Source: pexels.com

Dell's AI Data Platform received a major round of updates at the event. Three storage engines handle different access patterns - PowerScale for high-performance serial file access, Lightning for parallel file workloads, and ObjectScale for object data.

Accelerated Vector Indexing

Vector indexing is now 12x faster with NVIDIA Blackwell acceleration. For teams running retrieval-augmented generation (RAG) over large document sets, this changes the indexing from an overnight job to something that completes in hours.

GPU-Accelerated Analytics

SQL analytics on the platform runs up to 6x faster with NVIDIA Blackwell GPUs, using GPU-accelerated SQL execution rather than CPU-bound query planning.

Digital Twins

NVIDIA Omniverse integration enables digital twin and physical AI workflows - relevant for manufacturing, logistics, and industrial AI applications that need to simulate physical systems before deployment.

# Example: Codex connection to Dell AI Data Platform
codex:
  data_platform:
    endpoint: "https://.internal"
    auth: "enterprise-token"
    repositories:
      - name: "core-api"
        path: "/data/repos/core-api"
      - name: "docs"
        path: "/data/docs"
  context_window: 400000
  agents: 8

This is what the on-prem Codex integration looks like at the configuration level - a data platform endpoint, local repo paths, and agent parallelism that doesn't touch a public API.

PowerRack - The Turnkey Stack

Dell's new PowerRack family is the hardware side of the on-prem story. The problem it solves is integration complexity. Traditionally, rolling out a GPU-dense AI rack means procuring compute from one vendor, networking from another, storage from a third, and then spending days or weeks on integration and validation.

PowerRack bundles compute, networking, storage, cooling, and management into pre-engineered units that Dell verifies at the factory. From delivery to live workloads: six and a half hours.

What's Available Now

PowerRack for compute ships today. It supports multiple GPU generations, includes direct liquid cooling to handle dense power loads, and connects to Dell's Integrated Rack Controller with OpenManage Enterprise for unified management.

What's Coming

PowerRack for networking arrives in September 2026, with 800 Tb/sec of switching capacity per rack using eight Dell PowerSwitch SN6600 switches. PowerRack for storage follows in the second half of 2026, built on Dell Exascale Storage with PowerFlex adding block storage with the existing file and object tiers.

The PowerCool CDU C7000 liquid cooling distribution unit supports up to 220kW of heat dissipation with warm-water intake capability, which means it can integrate with building cooling infrastructure that doesn't run cold water.

Deskside Agentic AI

Dell's new Deskside Agentic AI brings frontier model inference to the workstation, using NVIDIA NemoClaw for local agent execution. Source: unsplash.com

Not every on-prem AI deployment needs a full rack. Dell's Deskside Agentic AI is a local inference platform for developer workstations, combining Dell Pro Max or Precision tower hardware with NVIDIA's NemoClaw software stack.

The hardware range runs from compact GB10-based systems to high-end towers with GB300 processors that can run models up to one trillion parameters locally. The cost argument: Dell says teams can cut cloud spending by up to 87% over two years compared to running equivalent token workloads on public API endpoints.

That's a big number, and it carries assumptions - primarily that token usage is heavy and sustained. Teams running occasional API calls won't see anywhere near that return. But for developers running agentic workflows all day against large codebases, the math does favor local inference at current API pricing.

The Model Partner Stack

Dell announced a sizable list of AI model providers that will run on Dell AI Factory infrastructure on-prem. The breadth here is the notable part.

Partner	What Runs On-Prem	Integration
OpenAI	Codex coding agent	Dell AI Data Platform connector
Google	Gemini via Google Distributed Cloud	Dell PowerEdge XE9780
SpaceXAI	Grok reasoning + multimodal	Hybrid or on-prem via AI Factory
Palantir	Foundry + AI Platform (AIP)	Dell AI Factory
Reflection	Open source frontier models	Dell AI Factory
Mistral AI	Medium 3.5 (128B)	Dell AI Factory
ServiceNow	AI workflow automation	Dell AI Ecosystem
Hugging Face	Model hub and deployment	Dell AI Ecosystem

The Dell AI Ecosystem Program sits underneath all of these - a validation layer that tests partner software on Dell hardware and publishes certified deployment blueprints.

Google's integration is worth separating from the others. Gemini arrives via Google Distributed Cloud, Google's own on-premises cloud infrastructure product. That means customers who want Gemini 3 Flash on Dell iron get it through a full GDC deployment, not just an API endpoint switch.

Cross-link: Codex is covered in detail in the GPT-5.3 Codex model profile.

Where It Falls Short

The compute PowerRack is the only thing shipping today. Networking and storage modules arrive later, which limits the full turnkey promise for teams that need all three. Customers who order now will manage separate components until Dell ships the integrated storage version in the second half of 2026.

On-prem management also introduces complexity that cloud removes. Patching model artifacts, maintaining secure connectors between Codex and internal repositories, and managing model provenance are all problems that become the customer's responsibility rather than OpenAI's. Dell's services team can help, but it's not free.

The 87% cost reduction figure applies to heavy agentic workflows. Dell's own footnote is that it assumes sustained, high-volume token usage. Teams running lighter developer workloads should model their actual usage before assuming the deskside system pays for itself quickly.

Finally, there's no single-vendor simplicity here. Customers using Codex on Dell hardware still manage a Dell infrastructure contract, an OpenAI enterprise agreement, and NVIDIA NemoClaw licensing. The integration is verified, but the vendor relationship surface area is wide.

Dell has the right hardware timing. The shift toward on-prem AI is real and documented in its own survey data. What makes this announcement worth watching is the model partner breadth - OpenAI, Google, Mistral, SpaceXAI, and Palantir all in one verified ecosystem is a stronger pull than any single partnership would be. Whether the integration depth matches the partner list depth is the question enterprises should ask before committing to a PowerRack order.

Sources:

Dell | Awesome Agents