Normal view

Received yesterday — 31 January 2026

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

28 January 2026 at 16:28
A decorative image.This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It...A decorative image.

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked bottleneck arises from the…

Source

Updating Classifier Evasion for Vision Language Models

28 January 2026 at 16:19
Cars with bounding boxes driving over a bridge in a city.Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For...Cars with bounding boxes driving over a bridge in a city.

Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For instance, vision language models (VLMs) can generate output from combined image and text input, enabling developers to build systems that interpret graphs, process camera feeds, or operate with traditionally human interfaces like desktop…

Source

Received before yesterday

How to Train an AI Agent for Command-Line Tasks with Synthetic Data and Reinforcement Learning

15 January 2026 at 16:00
What if your computer-use agent could learn a new Command Line Interface (CLI)—and operate it safely without ever writing files or free-typing shell commands?...

What if your computer-use agent could learn a new Command Line Interface (CLI)—and operate it safely without ever writing files or free-typing shell commands? In Part 1 of our series on building a computer use agent, we built a custom Bash computer-use agent using NVIDIA Nemotron in just one hour. In this sequel, we’ll take it further by teaching the same reasoning model with no prior…

Source

Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

9 January 2026 at 16:58
Decorative image.We keep seeing LLMs with larger context windows in the news, along with promises that they can hold entire conversation histories, volumes of books, or multiple...Decorative image.

We keep seeing LLMs with larger context windows in the news, along with promises that they can hold entire conversation histories, volumes of books, or multiple codebases in view at once. And yet, these models still repeat the same mistakes. We still have to copy and paste the earlier context back into the chat for LLMs to “get it”. A smart co-worker would pick up on these patterns, adapt…

Source

Multi-Agent Warehouse AI Command Layer Enables Operational Excellence and Supply Chain Intelligence

9 January 2026 at 14:00
Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep...

Warehouses have never been more automated, more data-rich, or more operationally demanding than they are now—yet they still rely on systems that can’t keep up. Throughput is rising, SLAs are shrinking, and fleets of AMRs, conveyors, and sensors expand every year. But beneath that technological surface, most sites still rely on a familiar trio: a Warehouse Management System (WMS)…

Source

Build an AI Catalog System That Delivers Localized, Interactive Product Experiences

9 January 2026 at 14:00
Decorative image.E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and...Decorative image.

E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and conversion. Manual enrichment doesn’t scale because it relies on catalog managers to manually write descriptions, apply tags, and categorize. The process is slow, inconsistent, and error-prone. This tutorial shows developers…

Source

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

8 January 2026 at 19:43
As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with...

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with AI more frequently, meaning that more tokens need to be generated. To serve these tokens at the lowest possible cost, AI platforms need to deliver the best possible token throughput per watt. Through extreme co-design across GPUs, CPUs…

Source

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

8 January 2026 at 17:28
Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want...

Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want to run conversational AI agents, multimodal perception, and high-level planning directly on the vehicle or robot – where latency, reliability, and the ability to operate offline matter most. While many existing LLM and vision language…

Source

Open Source AI Tool Upgrades Speed Up LLM and Diffusion Models on NVIDIA RTX PCs

Decorative image.AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and...Decorative image.

AI developer activity on PCs is exploding, driven by the rising quality of small language models (SLMs) and diffusion models, such as FLUX.2, GPT-OSS-20B, and Nemotron 3 Nano. At the same time, AI PC frameworks, including ComfyUI, llama.cpp, Ollama, and Unsloth are making functional advances, doubling in popularity over the past year as the number of developers using PC-class models has grown…

Source

New Software and Model Optimizations Supercharge NVIDIA DGX Spark

5 January 2026 at 22:50
Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close...

Since its release, NVIDIA has continued to push performance of the Grace Blackwell-powered DGX Spark through continuous software optimization and close collaboration with software partners and the open-source community. These efforts are delivering meaningful gains across inference, training and creative workflows. At CES 2026, the latest DGX Spark software release, combined with new model…

Source

Accelerate AI Inference for Edge and Robotics with NVIDIA Jetson T4000 and NVIDIA JetPack 7.1

5 January 2026 at 22:10
NVIDIA Jetson T4000.NVIDIA is introducing the NVIDIA Jetson T4000, bringing high-performance AI and real-time reasoning to a wider range of robotics and edge AI applications....NVIDIA Jetson T4000.

NVIDIA is introducing the NVIDIA Jetson T4000, bringing high-performance AI and real-time reasoning to a wider range of robotics and edge AI applications. Optimized for tighter power and thermal envelopes, T4000 delivers up to 1200 FP4 TFLOPs of AI compute and 64 GB of memory, providing an ideal balance of performance, efficiency, and scalability. With its energy-efficient design and production…

Source

How to Build a Voice Agent with RAG and Safety Guardrails

5 January 2026 at 22:06
Building an agent is more than just “call an API”—it requires stitching together retrieval, speech, safety, and reasoning components so they behave like...

Building an agent is more than just “call an API”—it requires stitching together retrieval, speech, safety, and reasoning components so they behave like one cohesive system. Each layer has its own interface, latency constraints, and integration challenges, and you start to feel them as soon as you move beyond a simple prototype. In this tutorial, you’ll learn how to build a voice-powered RAG…

Source

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It Efficient and Accurate

15 December 2025 at 14:00
Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large...

Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large contexts and long time spans. These systems demand models that deliver fast throughput, strong reasoning accuracy, and persistent coherence over large inputs. They also require a level of openness that allows developers to customize, extend…

Source

💾

Enabling Horizontal Autoscaling of Enterprise RAG Components on Kubernetes

12 December 2025 at 21:00
Today’s best AI agents rely on retrieval-augmented generation (RAG) to enable more accurate results. A RAG system facilitates the use of a knowledge base to...

Today’s best AI agents rely on retrieval-augmented generation (RAG) to enable more accurate results. A RAG system facilitates the use of a knowledge base to augment context to large language models (LLMs). A typical design pattern includes a RAG server that accepts prompt queries, consults a vector database for nearest context vectors, and then redirects the query with the appended context to an…

Source

Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

11 December 2025 at 16:00
Running advanced AI and computer vision workloads on small, power-efficient devices at the edge is a growing challenge. Robots, smart cameras, and autonomous...

Running advanced AI and computer vision workloads on small, power-efficient devices at the edge is a growing challenge. Robots, smart cameras, and autonomous machines need real-time intelligence to see, understand, and react without depending on the cloud. The NVIDIA Jetson platform meets this need with compact, GPU-accelerated modules and developer kits purpose-built for edge AI and robotics.

Source

Optimizing Inference for Long Context and Large Batch Sizes with NVFP4 KV Cache

8 December 2025 at 17:00
Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory...

Quantization is one of the strongest levers for large-scale inference. By reducing the precision of weights, activations, and KV cache, we can reduce the memory footprint and compute cost—directly improving throughput, latency, and achievable context length. This blog introduces NVFP4 KV cache quantization, a new KV format that enables significant performance gains on NVIDIA Blackwell GPUs.

Source

NVIDIA Kaggle Grandmasters Win Artificial General Intelligence Competition

5 December 2025 at 18:00
NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general...

NVIDIA researchers on Friday won a key Kaggle competition many in the field treat as a real-time pulse check on humanity’s progress toward artificial general intelligence (AGI). Ivan Sorokin and Jean-Francois Puget, two members of the Kaggle Grandmasters of NVIDIA (KGMoN), came in first on the Kaggle ARC Prize 2025 public leaderboard with a 27.64% score by building a solution evaluated on…

Source

NVIDIA-Accelerated Mistral 3 Open Models Deliver Efficiency, Accuracy at Any Scale 

2 December 2025 at 18:10
The new Mistral 3 open model family delivers industry-leading accuracy, efficiency, and customization capabilities for developers and enterprises. Optimized...

The new Mistral 3 open model family delivers industry-leading accuracy, efficiency, and customization capabilities for developers and enterprises. Optimized from NVIDIA GB200 NVL72 to edge platforms, Mistral 3 includes: All the models were trained on NVIDIA Hopper GPUs and are now available through Mistral AI on Hugging Face. Developers can choose from a variety of options for deploying…

Source

💾

❌