NVIDIA Unveils Nemotron 3 Nano Omni: Open Multimodal Model with 9x Throughput
NVIDIA's new 30B-A3B mixture-of-experts model unifies vision, audio, and language into a single open system, delivering up to nine times the throughput of comparable open omni models for AI agents.
NVIDIA on April 28, 2026 launched Nemotron 3 Nano Omni, a new open-weight multimodal model that fuses vision, speech, and language into a single system designed for autonomous AI agents. Built on a 30B-A3B hybrid mixture-of-experts architecture with a 256K-token context window, the model accepts text, images, audio, video, documents, charts, and graphical interfaces as input — a significant expansion over previous Nemotron releases, which were text-only across the Nano, Super, and Ultra tiers.
The headline figure is efficiency: NVIDIA says Nemotron 3 Nano Omni delivers up to nine times the throughput of comparable open omni models at the same level of interactivity, translating into lower inference costs and broader scalability. The model topped six leaderboards for document intelligence and combined audio-video understanding at launch, helped by new components including Conv3D and an Enhanced Visual System (EVS) that improve dense visual reasoning across long video clips and high-resolution screen captures.
Early adopters span both enterprise and AI-native companies. Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, Palantir, and Pyler are already deploying the model in production, while Dell Technologies, Docusign, Infosys, and Oracle are evaluating it. H Company CEO Gautier Cloix said in NVIDIA's announcement that "by building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before," highlighting the appeal of unified perception for desktop-automation agents.
Nemotron 3 Nano Omni is available immediately through Hugging Face, OpenRouter, build.nvidia.com, and more than 25 partner platforms, with deployment options ranging from on-device and on-prem to public cloud. The release lands as the broader AI ecosystem pivots from chat-first models toward autonomous agents capable of seeing, hearing, and acting — a category in which efficient open multimodal foundations are quickly becoming strategic. By open-sourcing a model that runs roughly an order of magnitude faster than rivals, NVIDIA is moving to anchor the next wave of agent infrastructure around its own software stack.