❌

Normal view

Received yesterday β€” 31 January 2026

Advancing GPU Programming with the CUDA Tile IR Backend for OpenAI Triton

30 January 2026 at 20:01
NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things...

NVIDIA CUDA Tile is a GPU-based programming model that targets portability for NVIDIA Tensor Cores, unlocking peak GPU performance. One of the great things about CUDA Tile is that you can build your own DSL on top of it. This post shares the work NVIDIA is doing to integrate CUDA Tile as a backend for OpenAI Triton, an open source Python DSL designed to write DL kernels for GPUs.

Source

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

30 January 2026 at 16:13
AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a...

AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. However, they also introduce a significant, often overlooked, attack surface by running tools from the command line with the same permissions and entitlements as the user, making them computer use agents, with all the risks those entail. The primary threat to these tools is…

Source

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

28 January 2026 at 17:00
NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to...

NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to Kubernetes clusters. This capability, built on the open source KAI Scheduler that powers NVIDIA Run:ai, addresses a long-standing challenge in shared GPU infrastructure. Consider two teams with equal priority sharing a cluster.

Source

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

28 January 2026 at 16:28
A decorative image.This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It...A decorative image.

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked bottleneck arises from the…

Source

Updating Classifier Evasion for Vision Language Models

28 January 2026 at 16:19
Cars with bounding boxes driving over a bridge in a city.Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For...Cars with bounding boxes driving over a bridge in a city.

Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For instance, vision language models (VLMs) can generate output from combined image and text input, enabling developers to build systems that interpret graphs, process camera feeds, or operate with traditionally human interfaces like desktop…

Source

Accelerating Diffusion Models with an Open, Plug-and-Play Offering

27 January 2026 at 19:00
Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset...

Recent advances in large-scale diffusion models have revolutionized generative AI across multiple domains, from image synthesis to audio generation, 3D asset creation, molecular design, and beyond. These models have demonstrated unprecedented capabilities in producing high-quality, diverse outputs across various conditional generation tasks. Despite these successes…

Source

Adaptive Inference in NVIDIA TensorRT for RTX Enables Automatic Optimization

26 January 2026 at 21:00
Deploying AI applications across diverse consumer hardware has traditionally forced a trade-off. You can optimize for specific GPU configurations and achieve...

Deploying AI applications across diverse consumer hardware has traditionally forced a trade-off. You can optimize for specific GPU configurations and achieve peak performance at the cost of portability. Alternatively, you can build generic, portable engines and leave performance on the table. Bridging this gap often requires manual tuning, multiple build targets, or accepting compromises.

Source

Received before yesterday

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning

15 January 2026 at 16:00
What if your computer-use agent could learn a new Command Line Interface (CLI)β€”and operate it safely without ever writing files or free-typing shell commands?...

What if your computer-use agent could learn a new Command Line Interface (CLI)β€”and operate it safely without ever writing files or free-typing shell commands? In Part 1 of our series on building a computer use agent, we built a custom Bash computer-use agent using NVIDIA Nemotron in just one hour. In this sequel, we’ll take it further by teaching the same reasoning model with no prior…

Source

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

9 January 2026 at 16:58
Decorative image.We keep seeing LLMs with larger context windows in the news, along with promises that they can hold entire conversation histories, volumes of books, or multiple...Decorative image.

We keep seeing LLMs with larger context windows in the news, along with promises that they can hold entire conversation histories, volumes of books, or multiple codebases in view at once. And yet, these models still repeat the same mistakes. We still have to copy and paste the earlier context back into the chat for LLMs to β€œget it”. A smart co-worker would pick up on these patterns, adapt…

Source

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

9 January 2026 at 14:00
Warehouses have never been more automated, more data-rich, or more operationally demanding than they are nowβ€”yet they still rely on systems that can’t keep...

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are nowβ€”yet they still rely on systems that can’t keep up. Throughput is rising, SLAs are shrinking, and fleets of AMRs, conveyors, and sensors expand every year. But beneath that technological surface, most sites still rely on a familiar trio: a Warehouse Management System (WMS)…

Source

Build an AI Catalog System That Delivers Localized, Interactive Product Experiences

9 January 2026 at 14:00
Decorative image.E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and...Decorative image.

E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and conversion. Manual enrichment doesn’t scale because it relies on catalog managers to manually write descriptions, apply tags, and categorize. The process is slow, inconsistent, and error-prone. This tutorial shows developers…

Source

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

8 January 2026 at 19:43
As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads usersβ€”from consumers to enterprisesβ€”to interact with...

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads usersβ€”from consumers to enterprisesβ€”to interact with AI more frequently, meaning that more tokens need to be generated. To serve these tokens at the lowest possible cost, AI platforms need to deliver the best possible token throughput per watt. Through extreme co-design across GPUs, CPUs…

Source

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

8 January 2026 at 17:28
Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want...

Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want to run conversational AI agents, multimodal perception, and high-level planning directly on the vehicle or robot – where latency, reliability, and the ability to operate offline matter most. While many existing LLM and vision language…

Source

Introducing NVIDIA BlueField-4-Powered Inference Context Memory Storage Platform for the Next Frontier of AI

6 January 2026 at 17:30
AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward...

AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward trillions of parameters. These systems currently rely on agentic long‑term memory for context that persists across turns, tools, and sessions so agents can build on prior reasoning instead of starting from scratch on every request.

Source

Scaling Power-Efficient AI Factories with NVIDIA Spectrum-X Ethernet PhotonicsΒ 

6 January 2026 at 16:59
An image of the Spectrum-X Ethernet.NVIDIA is bringing the world’s first optimized Ethernet networking with co-packaged optics to AI factories, enabling scale-out and scale-across on the NVIDIA...An image of the Spectrum-X Ethernet.

NVIDIA is bringing the world’s first optimized Ethernet networking with co-packaged optics to AI factories, enabling scale-out and scale-across on the NVIDIA Rubin platform with NVIDIA Spectrum-X Ethernet Photonics, the flagship switch for multi-trillion-parameter AI infrastructure. This blog post explores key optimizations and innovations in the protocol and hardware of Spectrum-X Ethernet…

Source

Open Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs

Decorative image.AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and...Decorative image.

AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and Nemotron 3 Nano. At the same time, AI PC frameworks, including ComfyUI, llama.cpp, Ollama, and Unsloth are making functional advances, doubling in popularity over the past year as the number of developers using PC-class models has grown…

Source

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

5 January 2026 at 22:50
Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close...

Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and creative workflows. At CES 2026, the latest DGX Spark software release, combined with new model…

Source

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

5 January 2026 at 22:20
AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

Source

Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1

5 January 2026 at 22:10
NVIDIA Jetson T4000.NVIDIA is introducing the NVIDIA Jetson T4000, bringing high-performance AI and real-time reasoning to a wider range of robotics and edge AI applications....NVIDIA Jetson T4000.

NVIDIA is introducing the NVIDIA Jetson T4000, bringing high-performance AI and real-time reasoning to a wider range of robotics and edge AI applications. Optimized for tighter power and thermal envelopes, T4000 delivers up to 1200β€―FP4β€―TFLOPs of AI compute and 64β€―GB of memory, providing an ideal balance of performance, efficiency, and scalability. With its energy-efficient design and production…

Source

How to Build a Voice Agent with RAG and Safety Guardrails

5 January 2026 at 22:06
Building an agent is more than just β€œcall an API”—it requires stitching together retrieval, speech, safety, and reasoning components so they behave like...

Building an agent is more than just β€œcall an API”—it requires stitching together retrieval, speech, safety, and reasoning components so they behave like one cohesive system. Each layer has its own interface, latency constraints, and integration challenges, and you start to feel them as soon as you move beyond a simple prototype. In this tutorial, you’ll learn how to build a voice-powered RAG…

Source

❌