top of page
Profile
Join date: Mar 12, 2024
Posts (35)
Jan 12, 2026 ∙ 5 min
LLM Inference Benchmarking - genAI-perf and vLLM
Learn how to benchmark LLM inference using NVIDIA GenAI-Perf and vLLM on GPU infrastructure. This guide walks developers and platform teams through setting up a single-node inference stack, measuring latency and token throughput, understanding KV cache and batching behavior, and building a Grafana-based observability pipeline. Ideal for engineers learning LLM inferencing and teams operating production-grade inference platforms.
30
0
Oct 8, 2025 ∙ 6 min
Deploying Deepseek 3.2 Exp on Nvidia H200 — Learning lessons
Deploying DeepSeek-V3.2-Exp on NVIDIA H200 wasn’t plug-and-play—it was an engineering workout. This guide walks through GPU selection, environment setup with PyTorch + CUDA 12.8, vLLM configuration, warm-up behavior, and common NCCL issues. Includes validated Prometheus + Grafana monitoring, test scripts, and real metrics from a working 8×H200 run. Perfect for developers reproducing high-end MoE inference.
25
0
Aug 9, 2025 ∙ 4 min
The Strategic Importance of DevOps and Cloud in the AI Era
As artificial intelligence continues to reshape industries, the foundational technologies enabling its deployment—cloud infrastructure...
19
1
Chandan Kumar
Writer
AI Platform Architect
More actions
bottom of page
