Llm | debene.dev

Abandoned code repositories floating in cosmic space

The Project I Didn't Abandon

My laptop has a ~/projects folder. Most of it is a graveyard. Not because the ideas were bad — I’d still build some of them if I sat down today. They’re dead because I get excited by a technical problem, work on it for two weekends, hit the part that stops being fun, and drift to the next thing. The codebase stays. The git log doesn’t. I’m 40, a Cloud Architect with ~18 years across IBM and AWS, and I have ADHD. Diagnosed late, lived with it longer. The pattern above isn’t laziness — it’s a specific shape of attention. Hyperfocus until the dopamine of novelty runs out, then gravitational pull toward whatever’s next. Anyone with this wiring recognizes the feeling: the moment a project transitions from “fun problem” to “ten unsexy decisions in a row,” part of your brain leaves the room. ...

80 CPU cores running llama-bench at 99% utilization

Running Modern LLMs on a 2016 IBM POWER8 in 2026

What Are We Even Doing Here? It’s 2026. Most people run LLMs on NVIDIA H100s, AMD MI300X, or at least a decent gaming GPU. I’m running them on a 2016 IBM POWER8 server with 160 hardware threads and zero CUDA cores. Why? Because I can. And because nobody else has published POWER8 LLM benchmarks in 2026. And because alternative architectures deserve love too. This post covers: Building llama.cpp on ppc64le with GCC 16 Running Qwen 2.5 7B (text + vision) on POWER8 NUMA tuning discoveries (spoiler: conventional wisdom is wrong) Multimodal inference (yes, vision models work too) Full reproduceability (Gentoo USE flags, build commands, everything) TL;DR: Got 6.81 tokens/s on text generation and fully functional vision inference. POWER8 reads license plates better than some humans. ...

Apple Silicon vs IBM POWER8: A Tale of Two Architectures Running LLMs in 2026

Apple Silicon vs IBM POWER8: A Tale of Two Architectures Running LLMs in 2026 Last week I published benchmarks of running Qwen 2.5 7B on a 2016 IBM POWER8. The results were surprisingly good — 6.81 tokens/s on CPU-only inference with 80 threads hammering away. But then came the inevitable question: How does it compare to modern hardware? So I ran the same benchmarks on my daily driver: a Mac Studio with Apple M2 Max. Same model (Qwen 2.5 7B Q4_K_M), same quantization, different decade. Here’s what I found. ...