Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

30 January 2026 at 20:01

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things about CUDA Tile is that you can build your own DSL on top of it. This post shares the work NVIDIA is doing to integrate CUDA Tile as a backend for OpenAI Triton, an open source Python DSL designed to write DL kernels for GPUs.

Source

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

NVIDIA Technical Blog

By:Jinman Xie

14 January 2026 at 20:41

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix multiplication as a core example. In this post, you’ll learn: Before you begin, be sure your environment meets the following requirements (see the quickstart for more information): Environment requirements: Install…

Source

NVIDIA CUDA 13.1 Powers Next-Gen GPU Programming with NVIDIA CUDA Tile and Performance Gains

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release,... Decorative image.

NVIDIA CUDA 13.1 introduces the largest and most comprehensive update to the CUDA platform since it was invented two decades ago. In this release, you’ll find new features and updates for improving performance and driving accelerated computing, including: To help create software for current and future GPUs, NVIDIA CUDA 13.1 is launching CUDA Tile, which enables you to write…

Source

Simplify GPU Programming with NVIDIA CUDA Tile in Python

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was... Decorative image.

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented. Writing GPU tile kernels enables you to write your algorithm at a higher level than a single-instruction multiple-thread (SIMT) model, while the compiler and runtime handle the partitioning of work onto threads under the covers.

Source

Focus on Your Algorithm—NVIDIA CUDA Tile Handles the Hardware

NVIDIA Technical Blog

By:Jonathan Bentz

4 December 2025 at 22:20

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a... CUDA Tile example.

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a virtual instruction set for tile-based parallel programming, focusing on the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. CUDA exposes a single…

Source

Reading view