Updating Classifier Evasion for Vision Language Models

28 January 2026 at 16:19

Cars with bounding boxes driving over a bridge in a city.

Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For... Cars with bounding boxes driving over a bridge in a city.

Advances in AI architectures have unlocked multimodal functionality, enabling transformer models to process multiple forms of data in the same context. For instance, vision language models (VLMs) can generate output from combined image and text input, enabling developers to build systems that interpret graphs, process camera feeds, or operate with traditionally human interfaces like desktop…

Source

Build an AI Catalog System That Delivers Localized, Interactive Product Experiences

NVIDIA Technical Blog

By:Antonio Martinez

9 January 2026 at 14:00

E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and... Decorative image.

E-commerce catalogs often contain sparse product data, generic images, a basic title, and short description. This limits discoverability, engagement, and conversion. Manual enrichment doesn’t scale because it relies on catalog managers to manually write descriptions, apply tags, and categorize. The process is slow, inconsistent, and error-prone. This tutorial shows developers…

Source

Accelerating LLM and VLM Inference for Automotive and Robotics with NVIDIA TensorRT Edge-LLM

NVIDIA Technical Blog

By:Lin Chai

8 January 2026 at 17:28

Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want...

Large language models (LLMs) and multimodal reasoning systems are rapidly expanding beyond the data center. Automotive and robotics developers increasingly want to run conversational AI agents, multimodal perception, and high-level planning directly on the vehicle or robot – where latency, reliability, and the ability to operate offline matter most. While many existing LLM and vision language…

Source

Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

NVIDIA Technical Blog

By:Tim Lin

17 December 2025 at 02:00

Optimizing Semiconductor Defect Classification with Generative AI and Vision Foundation Models

In the heart of every modern electronic device lies a silicon chip, built through a manufacturing process so precise that even a microscopic defect can...

In the heart of every modern electronic device lies a silicon chip, built through a manufacturing process so precise that even a microscopic defect can determine success or failure. As semiconductor devices grow more complex, reliably detecting and classifying defects has become a critical bottleneck. Historically, chipmakers have relied on convolutional neural networks (CNNs) to automate…

Source

Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

NVIDIA Technical Blog

By:Chitoku Yato

11 December 2025 at 16:00

Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

Running advanced AI and computer vision workloads on small, power-efficient devices at the edge is a growing challenge. Robots, smart cameras, and autonomous...

Running advanced AI and computer vision workloads on small, power-efficient devices at the edge is a growing challenge. Robots, smart cameras, and autonomous machines need real-time intelligence to see, understand, and react without depending on the cloud. The NVIDIA Jetson platform meets this need with compact, GPU-accelerated modules and developer kits purpose-built for edge AI and robotics.

Source

Make Sense of Video Analytics by Integrating NVIDIA AI Blueprints

NVIDIA Technical Blog

By:Ilyas Bankole-Hameed

3 November 2025 at 21:48

Organizations are increasingly seeking ways to extract insights from video, audio, and other complex data sources. Retrieval-augmented generation (RAG) enables... Decorative image.

Organizations are increasingly seeking ways to extract insights from video, audio, and other complex data sources. Retrieval-augmented generation (RAG) enables generative AI systems to use proprietary enterprise data. However, incorporating video content into these workflows introduces new technical hurdles, such as efficient ingestion, indexing, and maintaining compliance across diverse sources.

Source

Advancing Explainable AI in Radiology Research with NVIDIA Clara Reason

NVIDIA Technical Blog

By:Andriy Myronenko

3 November 2025 at 18:02

Medical AI has reached an inflection point. While vision-language models (VLMs) have shown promise in medical imaging, they have lacked the systematic,...

Medical AI has reached an inflection point. While vision-language models (VLMs) have shown promise in medical imaging, they have lacked the systematic, transparent reasoning that clinicians need to trust AI-assisted diagnoses. Changing this is NVIDIA Clara, a family of models, tools, and recipes that are built for accelerating scientific discovery, analyzing medical images…

Source

Reading view