NVIDIA Drops 110 Open-Source Skills for Physical AI Devs

NVIDIA's Agent Toolkit lands 110+ verified skills on GitHub covering robotics, autonomous vehicles, vision AI, and industrial systems - turning complex physical AI pipelines into single agent calls.

NVIDIA Drops 110 Open-Source Skills for Physical AI Devs

A robotics engineer at a consumer electronics factory needs synthetic defect images to train a visual inspection model. Without automation, this means wiring together Isaac Sim, a Cosmos world model, a labeling pipeline, and a training job - three or four days of Python glue code, environment debugging, and format juggling before a single training image is produced.

With NVIDIA's new Agent Toolkit skills, that same pipeline becomes a single agent call. The skill handles orchestration; the engineer writes a prompt.

That's the premise behind the 110+ open-source agent skills NVIDIA released on June 1 at GTC Taipei. The skills are available now at github.com/nvidia/skills and through skills.sh, compatible with any coding agent - Claude Code, Codex, or anything else that can run shell commands.

TL;DR

  • 110+ verified skills on GitHub spanning robotics, AVs, vision AI, and industrial systems
  • Installs via npx skills add nvidia/skills; each skill includes agent instructions, governance metadata, and a cryptographic signature
  • Builds on Cosmos 3, Isaac Sim, Metropolis, Alpamayo, and Jetson
  • Li Auto runs 1,000+ neural scene reconstructions and 300,000+ rendered frames daily using the underlying pipeline
  • Security layer via NemoClaw and OpenShell is built in

The Problem These Skills Solve

Physical AI development is a pipeline problem. Training a robot to navigate a warehouse is not one task - it's six sequential tasks with incompatible APIs and independent failure modes. Agents can execute steps competently. They can't determine the sequence without guidance.

A skill fills that gap: it's a machine-readable document that tells an agent exactly what tools to call, what outputs to produce, and how to verify results. NVIDIA's skills library is a curated public collection of those documents, covering its own platforms.

"AI agents are revolutionizing software development, and that shift is now coming to physical AI," said Jensen Huang at GTC Taipei. "Developers can now use agents to build the robots, autonomous vehicles, and industrial systems of the future at an incredible pace."

Industrial robots on a manufacturing floor with precise automated arm movements NVIDIA's physical AI agent skills target factories, logistics centers, and autonomous vehicle fleets - environments where simulation-to-reality pipelines determine whether a model works in production. Source: unsplash.com

How the Pipeline Runs

The synthetic data pipeline for a manufacturing inspection model, executed through NVIDIA agent skills, runs like this:

  1. Scene initialization - Agent calls the cosmos-neural-reconstruction skill with a folder of raw camera frames
  2. World model generation - Cosmos 3 reconstructs a 3D scene representation from the input
  3. Defect variation - Agent calls metropolis-defect-image-generation with defect type parameters (scratch, dent, missing component)
  4. Synthetic frame generation - Cosmos renders photorealistic defect images at scale across the reconstructed scene
  5. Video augmentation - Agent calls metropolis-video-augmentation to add lighting variation, occlusion, and sensor noise
  6. Export formatting - Skill handles output to COCO or Pascal VOC format based on the downstream training framework
  7. Validation - Each skill ships with a BENCHMARK.md and Tier-3 evaluation datasets; agent verifies outputs meet quality thresholds
  8. Training job handoff - Skill passes a structured manifest to the training orchestrator

The sequence runs without human intervention once the agent receives an initial prompt. Pegatron, which runs contract electronics manufacturing for multiple major hardware companies, reports a 67% reduction in training and deployment time using NVIDIA's synthetic data pipeline.

Step by Step

Installing Skills

Skills install through npm's npx toolchain:

# Install the full NVIDIA skills catalog
npx skills add nvidia/skills

# Install one skill directly, no prompts
npx skills add nvidia/skills --skill metropolis-defect-image-generation --yes

Each installed skill lands with three files: SKILL.md (agent-readable instructions), skill-card.md (governance metadata), and skill.oms.sig (a cryptographic signature verifiable against NVIDIA's root certificate). A skill file is text; a tampered skill is a supply chain vector. NVIDIA is treating provenance as infrastructure from day one. Instant H100-backed access is available through NVIDIA Brev, with cloud integrations from Microsoft Azure, CoreWeave, and Nebius.

Cosmos and Reconstruction Skills

The Cosmos 3 world foundation model underpins the reconstruction and generation skills. Neural Reconstruction takes lidar or multi-camera fleet data and builds a simulation-ready 3D scene. Li Auto and DeepRoute.ai use this at production scale - Li Auto reports 1,000+ neural reconstructions and over 300,000 rendered frames daily.

Isaac Sim and Robotics Skills

For robotics, the skills layer over Isaac Sim and Isaac Lab. An agent can launch a simulation session, author a scene, control robot actuators, capture training data, and run closed-loop evaluation - all through skill calls without custom orchestration code. Isaac Lab skills cover reinforcement learning setup, training runs, evaluation loops, and custom environment development. Agility Robotics, Universal Robots, and 1X Technologies are among the companies using these skills in active projects.

The June 1 announcement also included the Isaac GR00T Reference Humanoid Robot - a Unitree H2-based platform with 75 degrees of freedom and Jetson Thor compute, shipping late 2026. Research institutions including Stanford and ETH Zurich will use it to validate the full skills pipeline on physical hardware.

The NVIDIA Isaac GR00T reference humanoid robot based on the Unitree H2 Plus platform The Isaac GR00T Reference Humanoid Robot closes the sim-to-real loop: train in Isaac Sim using the new skills, confirm on a platform with known hardware characteristics. Source: revolutioninai.com

Autonomous Vehicles and Vision AI

Alpamayo 2 Super, a 32-billion-parameter vision-language-action model for level-4 autonomous driving, gets its own skill layer. AlpaGym connects policy rollouts to high-fidelity simulation; OmniDreams produces photorealistic camera frames in real time that respond to policy actions. Li Auto generates novel driving scenarios at a rate that on-road data collection can't match, filling the long-tail problem without sending fleets into low-frequency edge cases.

For vision AI, Metropolis skills cover Defect Image Generation, Video Search and Summarization, and Video Augmentation. Delta Electronics improved inspection detection rates by 17%; Foxconn reports a roughly 3% gain in first-pass manufacturing yield using synthetic defect data from this pipeline.

Pipeline vs. Manual: What Changes

StepManual approachWith NVIDIA skills
Scene reconstructionWrite custom Omniverse importers, manage file formatscosmos-neural-reconstruction skill, one call
Defect data generationScript image augmentation, manage GPU memory manuallymetropolis-defect-image-generation, declarative parameters
RL training setupConfigure Isaac Lab environment by handIsaac Lab skills handle environment authoring and training loop
Policy evaluationWrite eval harness, manage simulation/policy handoffsSkills include validation against Tier-3 eval datasets
Export and handoffCustom format conversion per downstream frameworkSkill outputs structured manifest in standard formats

Where It Breaks

Hardware requirements are steep. NVIDIA Brev provides instant H100 access, but running Cosmos reconstruction locally requires at minimum an A100 with 80 GB VRAM. Developers without cloud credits or high-end workstations will hit resource limits fast.

API versioning is a real risk. Skills encode specific library versions and API contracts. As Isaac Sim, Cosmos, and Alpamayo update, skills drift out of sync unless the catalog is actively maintained. NVIDIA maintains Tier-3 eval datasets per skill, but sustaining 110+ skills across six platforms is a significant ongoing commitment.

Agent reliability isn't guaranteed. A skill tells an agent what to do; it can't force the agent to execute correctly. Physical AI pipelines are long enough that built up errors - wrong parameter types, mismatched output formats, GPU OOM conditions - can fail silently without a well-tested agent harness. The benchmark files test skill correctness, not agent behavior under real conditions.


Physical AI has a long plumbing problem: the gap between a model capability and a production workflow has historically been filled with brittle custom code that breaks on every library update. Packaging that plumbing into verifiable, agent-executable skills with signed provenance is the right direction. The open question is whether NVIDIA can sustain the maintenance load as the six platforms underneath these skills keep shipping new releases.

Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.