❌

Reading view

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

NVIDIA Technical Blog

30 January 2026 at 20:01

abstract-image-green-square-overlay

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...

abstract-image-green-square-overlay

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things about CUDA Tile is that you can build your own DSL on top of it. This post shares the work NVIDIA is doing to integrate CUDA Tile as a backend for OpenAI Triton, an open source Python DSL designed to write DL kernels for GPUs.

How to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2

NVIDIA Technical Blog

26 January 2026 at 14:00

A global image showing weather patterns.

Global climate models are good at the big picture—but local climate extremes, like hurricanes and typhoons, often disappear in the details. Those patterns are... A global image showing weather patterns.

A global image showing weather patterns.

Global climate models are good at the big picture—but local climate extremes, like hurricanes and typhoons, often disappear in the details. Those patterns are still there—you just need the right tools to unlock them in high-resolution climate data. Using NVIDIA Earth‑2, this blog post shows you how to downscale coarse climate projections into higher-resolution, bias‑corrected fields—revealing…

@HPCpodcast: DDN’s Paul Bloch on High Performance Storage Strategies for AI Data Centers

Inside HPC & AI News | High-Performance Computing & Artificial Intelligence

23 January 2026 at 20:01

Our special guest today is Paul Bloch, President and Co-founder of DDN, the high performance storage and intelligent data platform company.
AI runs on massive amounts of fast and reliable data, which makes topics related to ....

The post @HPCpodcast: DDN’s Paul Bloch on High Performance Storage Strategies for AI Data Centers appeared first on Inside HPC & AI News | High-Performance Computing & Artificial Intelligence.

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

NVIDIA Technical Blog

14 January 2026 at 20:41

colored-squares-graphic

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...

colored-squares-graphic

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix multiplication as a core example. In this post, you’ll learn: Before you begin, be sure your environment meets the following requirements (see the quickstart for more information): Environment requirements: Install…

Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics

NVIDIA Technical Blog

By:Piotr Sielski

13 January 2026 at 20:32

Decorative image.

NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems. Mixed... Decorative image.

Decorative image.

NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems. Mixed integer programming (MIP) is a technique for solving problems. It can be modeled by a set of linear constraints, with some of the variables able to assume only integer values. The types of problems that can be modeled as MIP are numerous and…

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

NVIDIA Technical Blog

By:Tarik Hammadou

9 January 2026 at 14:00

warehouse-person-with-tablet

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep...

warehouse-person-with-tablet

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep up. Throughput is rising, SLAs are shrinking, and fleets of AMRs, conveyors, and sensors expand every year. But beneath that technological surface, most sites still rely on a familiar trio: a Warehouse Management System (WMS)…

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

NVIDIA Technical Blog

By:Allen Bourgoyne

5 January 2026 at 22:50

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close...

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and creative workflows. At CES 2026, the latest DGX Spark software release, combined with new model…

AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025

NVIDIA Technical Blog

By:Michelle Horton

31 December 2025 at 17:30

Four-image grid illustrating AI agents, robotics, data center infrastructure, and simulated environments.

2025 was another milestone year for developers and researchers working with NVIDIA technologies. Progress in data center power and compute design, AI... Four-image grid illustrating AI agents, robotics, data center infrastructure, and simulated environments.

Four-image grid illustrating AI agents, robotics, data center infrastructure, and simulated environments.

2025 was another milestone year for developers and researchers working with NVIDIA technologies. Progress in data center power and compute design, AI infrastructure, model optimization, open models, AI agents, and physical AI redefined how intelligent systems are trained, deployed, and moved into the real world. These posts highlight the innovations that resonated most with our readers.

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

NVIDIA Technical Blog

17 December 2025 at 19:00

Decorative image.

Data is the fuel of modern business, but relying on older CPU-based Apache Spark pipelines introduces a heavy toll. They’re inherently slow, require large... Decorative image.

Decorative image.

Data is the fuel of modern business, but relying on older CPU-based Apache Spark pipelines introduces a heavy toll. They’re inherently slow, require large infrastructure, and lead to massive cloud expenditure. As a result, GPU-accelerated Spark is becoming a leading solution, providing lightning-fast performance using parallel processing. This improved efficiency reduces cloud bills and saves…

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

NVIDIA Technical Blog

17 December 2025 at 18:30

Solving large-scale problems in Electronic Design Automation (EDA), Computational Fluid Dynamics (CFD), and advanced optimization workflows has become the norm...

Solving large-scale problems in Electronic Design Automation (EDA), Computational Fluid Dynamics (CFD), and advanced optimization workflows has become the norm as chip designs, manufacturing, and multi-physics simulations have grown in complexity. These workloads push traditional solvers and require unprecedented scalability and performance. The NVIDIA CUDA Direct Sparse Solver (cuDSS) is built…

Reducing CUDA Binary Size to Distribute cuML on PyPI

NVIDIA Technical Blog

15 December 2025 at 17:30

abstract-graphic

Starting with the 25.10 release, pip-installable cuML wheels can now be downloaded directly from PyPI. No more complex installation steps or managing Conda...

abstract-graphic

NVIDIA CUDA-X Powers the New Sirius GPU Engine for DuckDB, Setting ClickBench Records

NVIDIA Technical Blog

15 December 2025 at 17:18

Decorative image.

Sirius, an open-source GPU native SQL engine, achieved a new performance record on Clickbench—a widely used analytics benchmark. Developed by University of... Decorative image.

Decorative image.

Sirius, an open-source GPU native SQL engine, achieved a new performance record on Clickbench—a widely used analytics benchmark. Developed by University of Wisconsin-Madison with support from NVIDIA engineers, Sirius brings GPU-accelerated analytics to DuckDB. DuckDB has seen rapid adoption among organizations such as DeepSeek, Microsoft, and Databricks due to its simplicity, speed…

How to Train Scientific Agents with Reinforcement Learning

NVIDIA Technical Blog

By:Christian Munley

15 December 2025 at 14:00

Still 2025-12-08 135450_1.19.3

The scientific process can be repetitive and tedious, with researchers spending hours digging through papers, managing experiment workflows, or wrangling...

Still 2025-12-08 135450_1.19.3

The scientific process can be repetitive and tedious, with researchers spending hours digging through papers, managing experiment workflows, or wrangling massive multi-modal datasets. Scientific AI agents can take on much of that busywork, acting as assistants that review literature, generate hypotheses, plan experiments, submit computational jobs, orchestrate lab operations, analyze results…

NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition

NVIDIA Technical Blog

5 December 2025 at 18:00

Untitled

NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general...

Untitled

NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general intelligence (AGI). Ivan Sorokin and Jean-Francois Puget, two members of the Kaggle Grandmasters of NVIDIA (KGMoN), came in first on the Kaggle ARC Prize 2025 public leaderboard with a 27.64% score by building a solution evaluated on…

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

Decorative image.

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release,... Decorative image.

Decorative image.

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release, you’ll find new features and updates for improving performance and driving accelerated computing, including: To help create software for current and future GPUs, NVIDIA CUDA 13.1 is launching CUDA Tile, which enables you to write…

Simplify GPU Programming with NVIDIA CUDA Tile in Python

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

Decorative image.

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was... Decorative image.

Decorative image.

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented. Writing GPU tile kernels enables you to write your algorithm at a higher level than a single-instruction multiple-thread (SIMT) model, while the compiler and runtime handle the partitioning of work onto threads under the covers.

Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

CUDA Tile example.

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a... CUDA Tile example.

CUDA Tile example.

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a virtual instruction set for tile-based parallel programming, focusing on the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. CUDA exposes a single…

Model Quantization: Concepts, Methods, and Why It Matters

NVIDIA Technical Blog

By:Ruixiang Wang

24 November 2025 at 19:23

Decorative image.

AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address... Decorative image.

Decorative image.

AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address this challenge, enabling resource-intensive models to run on constrained hardware. The NVIDIA TensorRT and Model Optimizer tools simplify the quantization process, maintaining model accuracy while improving efficiency.

Breaking Through Reinforcement Learning Training Limits with Scaling Rollouts in BroRL

NVIDIA Technical Blog

19 November 2025 at 21:51

llm-training

When training large language models (LLMs) with reinforcement learning from verifiable rewards (RLVR), one of the most compelling questions is how to overcome...

llm-training

When training large language models (LLMs) with reinforcement learning from verifiable rewards (RLVR), one of the most compelling questions is how to overcome performance plateaus. The previous NVIDIA Research solution, Prolonged Reinforcement Learning (ProRL), showed that adding more reinforcement learning (RL) steps during prolonged training could expand the reasoning boundaries of LLMs.

Training XGBoost Models with GPU-Accelerated Polars DataFrames

NVIDIA Technical Blog

By:Jiaming Yuan

10 November 2025 at 19:30

xgboost-model-training

One of the many strengths of the PyData ecosystem is interoperability, which enables seamlessly moving data between libraries that specialize in exploratory...

xgboost-model-training

One of the many strengths of the PyData ecosystem is interoperability, which enables seamlessly moving data between libraries that specialize in exploratory analysis, training, and inference. The latest release of XGBoost introduces exciting new capabilities, including a category re-coder and integration with Polars DataFrames. This provides a streamlined approach to data handling.