Normal view

Received yesterday — 31 January 2026

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

30 January 2026 at 20:01
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things about CUDA Tile is that you can build your own DSL on top of it. This post shares the work NVIDIA is doing to integrate CUDA Tile as a backend for OpenAI Triton, an open source Python DSL designed to write DL kernels for GPUs.

Source

How to Unlock Local Detail in Coarse Climate Projections with NVIDIA Earth-2

26 January 2026 at 14:00
A global image showing weather patterns.Global climate models are good at the big picture—but local climate extremes, like hurricanes and typhoons, often disappear in the details. Those patterns are...A global image showing weather patterns.

Global climate models are good at the big picture—but local climate extremes, like hurricanes and typhoons, often disappear in the details. Those patterns are still there—you just need the right tools to unlock them in high-resolution climate data. Using NVIDIA Earth‑2, this blog post shows you how to downscale coarse climate projections into higher-resolution, bias‑corrected fields—revealing…

Source

Received before yesterday

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

14 January 2026 at 20:41
This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix multiplication as a core example. In this post, you’ll learn: Before you begin, be sure your environment meets the following requirements (see the quickstart for more information): Environment requirements: Install…

Source

Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics

13 January 2026 at 20:32
Decorative image.NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems. Mixed...Decorative image.

NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems. Mixed integer programming (MIP) is a technique for solving problems. It can be modeled by a set of linear constraints, with some of the variables able to assume only integer values. The types of problems that can be modeled as MIP are numerous and…

Source

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

9 January 2026 at 14:00
Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep...

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep up. Throughput is rising, SLAs are shrinking, and fleets of AMRs, conveyors, and sensors expand every year. But beneath that technological surface, most sites still rely on a familiar trio: a Warehouse Management System (WMS)…

Source

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

5 January 2026 at 22:50
Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close...

Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and creative workflows. At CES 2026, the latest DGX Spark software release, combined with new model…

Source

AI Factories, Physical AI, and Advances in Models, Agents, and Infrastructure That Shaped 2025

31 December 2025 at 17:30
Four-image grid illustrating AI agents, robotics, data center infrastructure, and simulated environments.2025 was another milestone year for developers and researchers working with NVIDIA technologies. Progress in data center power and compute design, AI...Four-image grid illustrating AI agents, robotics, data center infrastructure, and simulated environments.

2025 was another milestone year for developers and researchers working with NVIDIA technologies. Progress in data center power and compute design, AI infrastructure, model optimization, open models, AI agents, and physical AI redefined how intelligent systems are trained, deployed, and moved into the real world. These posts highlight the innovations that resonated most with our readers.

Source

Migrate Apache Spark Workloads to GPUs at Scale on Amazon EMR with Project Aether

17 December 2025 at 19:00
Decorative image.Data is the fuel of modern business, but relying on older CPU-based Apache Spark pipelines introduces a heavy toll. They’re inherently slow, require large...Decorative image.

Data is the fuel of modern business, but relying on older CPU-based Apache Spark pipelines introduces a heavy toll. They’re inherently slow, require large infrastructure, and lead to massive cloud expenditure. As a result, GPU-accelerated Spark is becoming a leading solution, providing lightning-fast performance using parallel processing. This improved efficiency reduces cloud bills and saves…

Source

Solving Large-Scale Linear Sparse Problems with NVIDIA cuDSS

17 December 2025 at 18:30
Solving large-scale problems in Electronic Design Automation (EDA), Computational Fluid Dynamics (CFD), and advanced optimization workflows has become the norm...

Solving large-scale problems in Electronic Design Automation (EDA), Computational Fluid Dynamics (CFD), and advanced optimization workflows has become the norm as chip designs, manufacturing, and multi-physics simulations have grown in complexity. These workloads push traditional solvers and require unprecedented scalability and performance. The NVIDIA CUDA Direct Sparse Solver (cuDSS) is built…

Source

NVIDIA CUDA-X Powers the New Sirius GPU Engine for DuckDB, Setting ClickBench Records

15 December 2025 at 17:18
Decorative image.Sirius, an open-source GPU native SQL engine, achieved a new performance record on Clickbench—a widely used analytics benchmark. Developed by University of...Decorative image.

Sirius, an open-source GPU native SQL engine, achieved a new performance record on Clickbench—a widely used analytics benchmark. Developed by University of Wisconsin-Madison with support from NVIDIA engineers, Sirius brings GPU-accelerated analytics to DuckDB. DuckDB has seen rapid adoption among organizations such as DeepSeek, Microsoft, and Databricks due to its simplicity, speed…

Source

How to Train Scientific Agents with Reinforcement Learning

15 December 2025 at 14:00
The scientific process can be repetitive and tedious, with researchers spending hours digging through papers, managing experiment workflows, or wrangling...

The scientific process can be repetitive and tedious, with researchers spending hours digging through papers, managing experiment workflows, or wrangling massive multi-modal datasets. Scientific AI agents can take on much of that busywork, acting as assistants that review literature, generate hypotheses, plan experiments, submit computational jobs, orchestrate lab operations, analyze results…

Source

NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition

5 December 2025 at 18:00
NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general...

NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general intelligence (AGI). Ivan Sorokin and Jean-Francois Puget, two members of the Kaggle Grandmasters of NVIDIA (KGMoN), came in first on the Kaggle ARC Prize 2025 public leaderboard with a 27.64% score by building a solution evaluated on…

Source

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

4 December 2025 at 22:20
Decorative image.NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago.  In this release,...Decorative image.

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release, you’ll find new features and updates for improving performance and driving accelerated computing, including: To help create software for current and future GPUs, NVIDIA CUDA 13.1 is launching CUDA Tile, which enables you to write…

Source

Simplify GPU Programming with NVIDIA CUDA Tile in Python

4 December 2025 at 22:20
Decorative image.The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was...Decorative image.

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented. Writing GPU tile kernels enables you to write your algorithm at a higher level than a single-instruction multiple-thread (SIMT) model, while the compiler and runtime handle the partitioning of work onto threads under the covers.

Source

Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware

4 December 2025 at 22:20
CUDA Tile example.With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a...CUDA Tile example.

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a virtual instruction set for tile-based parallel programming, focusing on the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. CUDA exposes a single…

Source

Model Quantization: Concepts, Methods, and Why It Matters

24 November 2025 at 19:23
Decorative image.AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address...Decorative image.

AI models are becoming increasingly complex, often exceeding the capabilities of available hardware. Quantization has emerged as a crucial technique to address this challenge, enabling resource-intensive models to run on constrained hardware. The NVIDIA TensorRT and Model Optimizer tools simplify the quantization process, maintaining model accuracy while improving efficiency.

Source

Breaking Through Reinforcement Learning Training Limits with Scaling Rollouts in BroRL

19 November 2025 at 21:51
When training large language models (LLMs) with reinforcement learning from verifiable rewards (RLVR), one of the most compelling questions is how to overcome...

When training large language models (LLMs) with reinforcement learning from verifiable rewards (RLVR), one of the most compelling questions is how to overcome performance plateaus. The previous NVIDIA Research solution, Prolonged Reinforcement Learning (ProRL), showed that adding more reinforcement learning (RL) steps during prolonged training could expand the reasoning boundaries of LLMs.

Source

Training XGBoost Models with GPU-Accelerated Polars DataFrames

10 November 2025 at 19:30
One of the many strengths of the PyData ecosystem is interoperability, which enables seamlessly moving data between libraries that specialize in exploratory...

One of the many strengths of the PyData ecosystem is interoperability, which enables seamlessly moving data between libraries that specialize in exploratory analysis, training, and inference. The latest release of XGBoost introduces exciting new capabilities, including a category re-coder and integration with Polars DataFrames. This provides a streamlined approach to data handling.

Source

❌