Holo3-35B-A3B
H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Holo3-35B-A3B is a sparse Mixture-of-Experts vision-language model from Paris-based H Company, released March 31, 2026 under Apache 2.0. It's designed for one thing: controlling a computer. The model takes screenshots as input and outputs mouse clicks, keyboard actions, and multi-step plans to complete desktop tasks. With 35B total parameters but only 3B active at inference time, it reaches compute costs closer to a small model while posting benchmark scores that put it ahead of GPT-5.4 and Claude Opus 4.6 on the desktop computer use evaluation that matters most.
TL;DR
- Purpose-built desktop computer use VLM; not a general-purpose assistant
- 82.6% on OSWorld-Verified (as of July 1, 2026), 262K context, Apache 2.0 license
- Beats GPT-5.4 (75.0%) and Claude Opus 4.6 (72.7%) on OSWorld-Verified at roughly one-tenth the inference cost
H Company raised $220 million in a 2024 seed round led by Eric Schmidt, Amazon, and a group of European investors including Bernard Arnault and Xavier Niel. The founding team came largely from DeepMind: Laurent Sifre was a principal scientist who contributed to AlphaGo, AlphaFold, and AlphaStar. The company's focus since inception has been agentic AI - specifically, giving models the perception and decision-making skills to operate software the way a human does.
Holo3 is the first open-weight model they've shipped. The larger closed variant, Holo3-122B-A10B, is available via the H Company platform at $0.40/M input tokens. The open 35B-A3B sits below it with roughly comparable OSWorld performance at lower cost, making it the practical choice for developers building local or cloud-hosted desktop automation.
Key Specifications
| Specification | Details |
|---|---|
| Provider | H Company |
| Model Family | Holo3 |
| Parameters | 35B total, ~3B active (sparse MoE) |
| Architecture | Fine-tuned from Qwen3.5-35B-A3B |
| Context Window | 262,144 tokens |
| Input Price | $0.25/M tokens |
| Output Price | $1.80/M tokens |
| Release Date | March 31, 2026 |
| License | Apache 2.0 |
| Open Source | Yes (HuggingFace: Hcompany/Holo3-35B-A3B) |
| Modality | Vision + text (VLM) |
Benchmark Performance
The computer use leaderboard tracks OSWorld-Verified as the primary desktop AI benchmark. Scores represent accuracy on multi-step task completion in a desktop environment. Human baseline is 72.4%.
| Benchmark | Holo3-35B-A3B | Holo3-122B-A10B | GPT-5.4 | Claude Opus 4.6 |
|---|---|---|---|---|
| OSWorld-Verified | 82.6% | 78.8% | 75.0% | 72.7% |
| AndroidWorld | ~79.3% (Holo3.1) | - | - | - |
The 82.6% OSWorld-Verified score reflects the current state of the model as updated by the Holo3.1 release (June 2026). At original launch, the 35B-A3B scored 77.8% on OSWorld-Verified - already above GPT-5.4's 75.0%.
What's unusual about the leaderboard position: the open 35B-A3B model now ranks above the closed 122B-A10B (78.8%). The gap likely reflects post-launch training improvements in the Holo3.1 update rather than a fundamental architectural advantage of the smaller model.
On agentic AI benchmarks, Holo3-35B-A3B's grounding capabilities - identifying the right pixel or element to click on a complex UI - are a consistent strength on ScreenSpot-Pro and OSWorld-G evaluations, according to H Company's published results. Independent verification of those specific numbers is pending.
Key Capabilities
Holo3-35B-A3B is built for desktop computer use and GUI automation rather than general-purpose chat. It takes screenshot images as input and predicts the next action in a multi-step task sequence. See the web browsing and computer use capability overview for context on how computer use models work in practice.
GUI grounding
The model is trained specifically to locate and interact with small UI elements - buttons, form fields, dropdown menus, scrollbars. This "grounding" task is harder than it looks at benchmark scale: UI elements vary wildly in size and position across applications, and small localization errors cascade into task failures. H Company's training mix includes large-scale synthetic trajectories plus human-annotated examples to cover edge cases.
Multi-step task planning
Beyond single clicks, Holo3-35B-A3B maintains a plan across sequences of actions. H Company assessed it on an internal suite of 486 multi-step tasks spanning e-commerce, business software, collaboration tools, and multi-application workflows. The published OSWorld-Verified numbers are derived from independent evaluation rather than this internal suite.
Mobile and desktop cross-environment support
The Holo3.1 update (June 2026) extended the model to Android environments in addition to desktop. On AndroidWorld, the 35B-A3B model scores 79.3%, up from 67% at the original Holo3 release. The same update added native function-calling protocol support with the original JSON output format, making the model compatible with LangChain, LlamaIndex, and other agent frameworks that expect tool-call syntax.
Local deployment
With quantized checkpoints in Q4 GGUF format (available via Hcompany/Holo-3.1-35B-A3B-GGUF on HuggingFace), the model runs on consumer hardware. H Company demonstrated it on DGX Spark with NVFP4 quantization, achieving roughly 140ms per step with 2x the end-to-end throughput of BF16. Performance in quantized form sits approximately 2 points below the BF16 baseline on OSWorld.
Pricing and Availability
The model weights are freely available under Apache 2.0 on HuggingFace at Hcompany/Holo3-35B-A3B. For cloud inference, H Company runs an API at hub.hcompany.ai with an OpenAI-compatible endpoint.
API pricing (Holo3.1-35B-A3B):
- Input: $0.25/M tokens
- Output: $1.80/M tokens
- Free tier: rate-limited access for testing
The larger closed Holo3-122B-A10B is priced at $0.40/M input and $3.00/M output. For comparison, Claude Operator-level computer use runs around $15.00/M input tokens, putting Holo3's API cost at roughly 1/60th the price for input and 1/8th for output.
Deployment options include vLLM, SGLang, and Docker. The inference API supports vision inputs natively.
Strengths and Weaknesses
Strengths
- OSWorld-Verified 82.6% is the current top score among open-weight models and ranks above GPT-5.4 and Claude Opus 4.6
- Sparse MoE means 3B active parameters at inference despite 35B total weights - inference cost matches a small model
- Apache 2.0 license allows commercial use and self-hosting without royalties
- 262K token context handles long task histories and large screenshots
- Holo3.1 update added function calling, quantized checkpoints, and AndroidWorld support
- Pricing at $0.25/M input is competitive for production computer use pipelines
Weaknesses
- Specialized training means it's not a drop-in replacement for general-purpose models on chat, coding, or reasoning tasks
- OSWorld scores are self-reported by H Company; independent audits of the 82.6% figure aren't yet published
- The original Holo3-122B-A10B flagship now benchmarks lower than the open 35B-A3B on OSWorld-Verified, which raises questions about measurement consistency
- H Company went through significant leadership changes in 2024-2025 (three cofounders departed in August 2024, CEO replaced in June 2025), introducing some execution risk for a young lab
- Local deployment on consumer GPUs requires quantization that costs roughly 2 points on OSWorld accuracy
Related Coverage
- Computer Use Leaderboard - Full OSWorld-Verified rankings where Holo3-35B-A3B currently appears
- Agentic AI Benchmarks Leaderboard - Broader agentic evaluation context
- Web Browsing and Computer Use - How computer use models work and what they're good for
FAQ
What is Holo3-35B-A3B best at?
Desktop and mobile GUI automation. It excels at multi-step tasks like filling forms, navigating software, and completing workflows on a real OS, as measured by OSWorld-Verified (82.6%) and AndroidWorld (79.3%).
Is Holo3-35B-A3B free to use?
Model weights are free under Apache 2.0 via HuggingFace. Cloud API access costs $0.25/M input and $1.80/M output tokens, with a free rate-limited tier for testing.
How does Holo3-35B-A3B compare to Claude Computer Use?
On OSWorld-Verified, Holo3-35B-A3B scores 82.6% vs Claude Opus 4.6's 72.7%. API input pricing is roughly 60x cheaper. Claude has broader general capabilities; Holo3 is purpose-built for computer use tasks.
Can Holo3-35B-A3B run locally?
Yes. Q4 GGUF quantized weights are available via Hcompany/Holo-3.1-35B-A3B-GGUF on HuggingFace. Performance is roughly 2 points below the full BF16 checkpoint on OSWorld benchmarks.
What agent frameworks does Holo3 support?
The Holo3.1 update added native function-calling protocol support with JSON outputs, making it compatible with LangChain, LlamaIndex, and other frameworks that expect standard tool-call syntax.
Sources:
- Holo3-35B-A3B on HuggingFace
- Holo3 launch blog post (H Company)
- Holo3.1 blog post (H Company)
- OSWorld-Verified Benchmark Leaderboard - BenchLM
- Holo3 official page - H Company
- Holo3.1 official page - H Company
- H Company Wikipedia
- H Company $220M seed round - TechCrunch
- H Company API pricing
- Holo3-35B-A3B at LLM Reference
✓ Last verified July 1, 2026
