NVIDIA unveils BlueField-4 ICMSP, cuts inference power use
7 days ago • ai-infrastructure
At CES 2026, NVIDIA introduced the BlueField-4-powered Inference Context Memory Storage Platform (ICMSP). ICMSP is a storage-first subsystem that moves inference context off host RAM onto NVMe SSDs. The goal is to cut power draw and boost token throughput. NVIDIA published the news in its investor release and developer blog. DDN issued a partner release, and trade press covered the announcement in Tom's Hardware and Forbes. ICMSP pairs NVIDIA's BlueField-4 DPU with SSD-resident context management for agentic inference workloads. NVIDIA says the platform can deliver up to 5x higher tokens-per-second for certain agent workloads. It also claims up to 5x better power efficiency versus host-memory-only inference. NVIDIA positions ICMSP alongside Rubin/Vera Rubin and DDN offload offerings. Tom's Hardware reported Rubin NVL72 claims of up to 5x inference performance and cost-per-token gains. IT teams should validate vendor numbers in lab tests before deployment. Focus on power, latency, endurance, and integration trade-offs.
Why It Matters
- Offloading inference context to NVMe SSDs reduces DRAM footprint and lowers host power draw, easing cooling and rack power limits.
- DPU-managed storage offload lets operators scale agentic workloads without linear increases in host memory provisioning.
- Validate latency, SSD endurance (write cycles), and software-stack changes in lab tests; vendor claims are based on early demos and partners.
- Early integrations with DDN and Rubin systems show ecosystem momentum, but expect vendor-specific implementations and integration work.
Trust & Verification
Source List (5)
Sources
- NVIDIA (Investor Relations / Press Release)OfficialJan 5, 2026
- NVIDIA Developer BlogOfficialJan 6, 2026
- DDN (Press Release)OfficialJan 6, 2026
- ForbesTier-1Jan 7, 2026
- Tom's HardwareTier-1