80 CPU cores running llama-bench at 99% utilization

Running Modern LLMs on a 2016 IBM POWER8 in 2026

What Are We Even Doing Here? It’s 2026. Most people run LLMs on NVIDIA H100s, AMD MI300X, or at least a decent gaming GPU. I’m running them on a 2016 IBM POWER8 server with 160 hardware threads and zero CUDA cores. Why? Because I can. And because nobody else has published POWER8 LLM benchmarks in 2026. And because alternative architectures deserve love too. This post covers: Building llama.cpp on ppc64le with GCC 16 Running Qwen 2.5 7B (text + vision) on POWER8 NUMA tuning discoveries (spoiler: conventional wisdom is wrong) Multimodal inference (yes, vision models work too) Full reproduceability (Gentoo USE flags, build commands, everything) TL;DR: Got 6.81 tokens/s on text generation and fully functional vision inference. POWER8 reads license plates better than some humans. ...

May 14, 2026 · 8 min · Felipe De Bene
IBM POWER8 S822LC server

Who Says Elephants Can't Dance? POWER8 vs Intel i9-12900K Showdown

A Tale of Two Philosophies: When 160 Threads Meet Modern Silicon I have a problem. I see weird computer hardware on eBay, and I buy it. Last year’s victim: an IBM POWER8 S822LC server from 2015. Cost: $50. Shipping: $200. The look on my partner’s face when it arrived: priceless. Everyone told me it was a relic, a curiosity, basically e-waste with RGB lights (okay, it doesn’t have RGB, but it should). But staring at those specs—160 hardware threads via SMT-8—I couldn’t help but wonder: could raw, embarrassing parallelism compete with modern single-thread supremacy? ...

November 6, 2025 · 10 min · Felipe De Bene