local-llm

2 articles tagged “local-llm”.

2026-01-08

4 min

A single RTX 4090 can't run Llama-3 70B at usable speeds. Here's the VRAM math, quantization tradeoffs, and what actually works on 24GB.

2026-01-03

6 min

Tested Llama 3 70B on RTX 4090, 3090, and A100. Exact VRAM breakdown for FP16 vs Q4 quantization, KV cache overhead, and why OOM errors happen.