Spinning Up Local LLMs

Mon, 25 May 2026 10:51:00 +0800

1. Introduction

I have an old desktop with an NVIDIA RTX 2070 SUPER (8GB VRAM) sitting around, quietly collecting dust and judging me. So naturally, I decided to give it a job: run a local LLM server with llama.cpp, then wire it up to an AI coding agent.

No cloud tokens, no monthly bill anxiety, no sending prompts halfway across the planet. Just one old GPU, a quantized model, and a little bit of stubbornness.

Llama-Cpp on Mathscantor's Cybersecurity Blog

Spinning Up Local LLMs

1. Introduction