Streamlining CUB with a Single-Call API

21 January 2026 at 21:28

The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional "two-phase" API, which separates memory estimation...

The C++ template library CUB is a go-to for high-performance GPU primitive algorithms, but its traditional “two-phase” API, which separates memory estimation from allocation, can be cumbersome. While this programming model offers flexibility, it often results in repetitive boilerplate code. This post explains the shift from this API to the new CUB single-call API introduced in CUDA 13.1…

Source

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

NVIDIA Technical Blog

By:Kyle Aubrey

5 January 2026 at 22:20

end-to-end-press-ces26-inside-vr-tech-blog-1920x1080-4671300_-r1

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

Source

Better Bug Detection: How Compile-Time Instrumentation for Compute Sanitizer Enhances Memory Safety

NVIDIA Technical Blog

By:Mark Stephenson

10 December 2025 at 17:00

CUDA C++ is standard C++ with extensions that enable functions to run on many parallel threads on a GPU. It has facilitated widespread adoption while allowing...

Source

How to Get Started with Neural Shading for Your Game or Application

NVIDIA Technical Blog

By:Shannon Woods

13 November 2025 at 19:55

For the past 25 years, real-time rendering has been driven by continuous hardware improvements. The goal has always been to create the highest fidelity image...

For the past 25 years, real-time rendering has been driven by continuous hardware improvements. The goal has always been to create the highest fidelity image possible within 16 milliseconds. This has fueled significant innovation in graphics hardware, pipelines, and renderers. But the slowing pace of Moore’s Law mandates the invention of new computational architectures to keep pace with the…

Source

Reading view