❌

Normal view

Received yesterday β€” 31 January 2026

Establishing a Scalable Sparse Ecosystem with the Universal Sparse Tensor

30 January 2026 at 18:00
Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing,...

Sparse tensors are vectors, matrices, and higher-dimensional generalizations with many zeros. They are crucial in various fields such as scientific computing, signal processing, and deep learning due to their efficiency in storage, computation, and power. Despite their benefits, handling sparse tensors manually or through existing libraries is often cumbersome, error-prone, nonportable…

Source

Received before yesterday

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

14 January 2026 at 20:41
This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...

This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix multiplication as a core example. In this post, you’ll learn: Before you begin, be sure your environment meets the following requirements (see the quickstart for more information): Environment requirements: Install…

Source

Advanced Large-Scale Quantum Simulation Techniques in cuQuantum SDK v25.11

16 December 2025 at 18:00
Simulating large-scale quantum computers has become more difficult as the quality of quantum processing units (QPUs) improves. Validating the results is key to...

Simulating large-scale quantum computers has become more difficult as the quality of quantum processing units (QPUs) improves. Validating the results is key to ensure that after the devices scale beyond what is classically simulable, we can still trust the outputs. Similarly, when generating large-scale datasets for various AI models that aim to aid in the operation of quantum processors…

Source

Simplify GPU Programming with NVIDIA CUDA Tile in Python

4 December 2025 at 22:20
Decorative image.The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was...Decorative image.

The release of NVIDIA CUDA 13.1 introduces tile-based programming for GPUs, making it one of the most fundamental additions to GPU programming since CUDA was invented. Writing GPU tile kernels enables you to write your algorithm at a higher level than a single-instruction multiple-thread (SIMT) model, while the compiler and runtime handle the partitioning of work onto threads under the covers.

Source

Focus on Your Algorithmβ€”NVIDIA CUDA Tile Handles the Hardware

4 December 2025 at 22:20
CUDA Tile example.With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a...CUDA Tile example.

With its largest advancement since the NVIDIA CUDA platform was invented in 2006, CUDA 13.1 is launching NVIDIA CUDA Tile. This exciting innovation introduces a virtual instruction set for tile-based parallel programming, focusing on the ability to write algorithms at a higher level and abstract away the details of specialized hardware, such as tensor cores. CUDA exposes a single…

Source

Achieve CUTLASS C++ Performance with Python APIs Using CuTe DSL

13 November 2025 at 20:30
CuTe, a core component of CUTLASS 3.x, provides a unified algebra for describing data layouts and thread mappings, and abstracts complex memory access patterns...

CuTe, a core component of CUTLASS 3.x, provides a unified algebra for describing data layouts and thread mappings, and abstracts complex memory access patterns into composable mathematical operations. While CUTLASS 3.x and CuTe have empowered kernel developers to achieve peak performance on Tensor Cores through intuitive abstractions, the extensive use of C++ templates has resulted in high…

Source

❌