Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
-
Updated
Jun 5, 2026 - Python
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
Use your NVIDIA GPU's VRAM as swap space on Linux. Built for laptops with soldered memory and no upgrade path. If you have an RTX card sitting there with 8GB of VRAM and you're getting swapped to SSD, this puts that VRAM to work
Estimate whether a Hugging Face model fits and fine-tunes on your local GPU.
MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation
Social AI Agent Blueprint. Powered by vram.ai
First open-source TurboQuant KV cache compression for LLM inference. Drop-in for HuggingFace. pip install turboquant.
Plug-and-play homelab dashboard in one container — GPU, local-AI model VRAM, Docker, systemd and host health. One page, no Prometheus/Grafana.
Rust block device in userspace
Find which LLMs actually fit on your hardware. Client-side GPU detection, quantization-aware memory estimation, and speed predictions.
Add a description, image, and links to the vram topic page so that developers can more easily learn about it.
To associate your repository with the vram topic, visit your repo's landing page and select "manage topics."