2026-01-08
4 min
RTX 4090 VRAM Limits: What Models Actually Fit
A single RTX 4090 can't run Llama-3 70B at usable speeds. Here's the VRAM math, quantization tradeoffs, and what actually works on 24GB.
2026-01-03
6 min
Llama 70B VRAM Requirements: RTX 4090, 3090, A100
Tested Llama 3 70B on RTX 4090, 3090, and A100. Exact VRAM breakdown for FP16 vs Q4 quantization, KV cache overhead, and why OOM errors happen.