local-llm

2 articles tagged “local-llm”.

2026-01-08
4 min

RTX 4090 VRAM Limits: What Models Actually Fit

A single RTX 4090 can't run Llama-3 70B at usable speeds. Here's the VRAM math, quantization tradeoffs, and what actually works on 24GB.

2026-01-03
6 min

Llama 70B VRAM Requirements: RTX 4090, 3090, A100

Tested Llama 3 70B on RTX 4090, 3090, and A100. Exact VRAM breakdown for FP16 vs Q4 quantization, KV cache overhead, and why OOM errors happen.