Benchmark

Apple Silicon vs IBM POWER8: A Tale of Two Architectures Running LLMs in 2026 Last week I published benchmarks of running Qwen 2.5 7B on a 2016 IBM POWER8. The results were surprisingly good — 6.81 tokens/s on CPU-only inference with 80 threads hammering away. But then came the inevitable question: How does it compare to modern hardware? So I ran the same benchmarks on my daily driver: a Mac Studio with Apple M2 Max. Same model (Qwen 2.5 7B Q4_K_M), same quantization, different decade. Here’s what I found. ...