RTX 5090 Mobile: First LLM Benchmarks Are In
The first benchmarks for the RTX 5090 Mobile GPU are out, and the results are promising for on-the-go LLM inference. Hardware Canucks ran early tests on a Razer Blade 16 laptop equipped with a 135W RTX 5090 GPU, revealing significant performance gains over the RTX 4090 Mobile. Given that this is the first consumer laptop GPU with 24GB of VRAM, it opens new possibilities for running large-scale quantized LLMs locally. But does it make sense as an LLM inference machine? Let’s break it down.
LLM Performance Benchmarks
The tests were conducted on a Razer Blade 16 laptop with a 135W RTX 5090 GPU, comparing its inference performance (4K context) on various quantized LLMs.
Model | Quantization | Quantization size | RTX 5090 Mobile |
---|---|---|---|
Llama 3.1 8B | 4-bit | 4.92GB | 110 t/s |
Qwen2.5 Coder 14B | 4-bit | 8.99GB | 61.72 t/s |
DeepSeek R1 Distill Qwen 32B | 4-bit | 19.9GB | 23.35 t/s |
Key Takeaways
The RTX 5090 Mobile demonstrates a significant leap in performance, showing around 30% faster inference speeds compared to the 175W RTX 4090 Mobile in tested workloads. One of its key advantages is its 24GB of VRAM, which allows it to run 32B quantized models like QwQ 32B and DeepSeek R1 Distill Qwen 32B – something the RTX 4090 Mobile, with only 16GB of VRAM, simply cannot handle.
Additionally, its 896 GB/s memory bandwidth puts it on par with the desktop RTX 3090, ensuring smooth token generation speeds, even at higher context lengths. While prompt processing benchmarks have yet to be conducted, the initial results already indicate a major performance leap over previous mobile GPUs, positioning the RTX 5090 Mobile as a compelling choice for users needing high-performance LLM inference on the go.
Does It Make Sense for LLM Enthusiasts?
At launch, RTX 5090 laptops are only available for pre-order, with the cheapest model – the Lenovo Legion Pro 7i 16” – priced at $4000 (Intel Core Ultra 9, 32GB RAM, 1TB SSD). Given that insane price at the moment with desktop RTX 5090 alone cost $3999, a laptop with the same GPU might seem like a good deal – if you specifically need a mobile LLM workstation.
Price table with with RTX 5090 24GB Mobile GPU laptops:
Model | CPU | RAM | Display | Price |
---|---|---|---|---|
Lenovo Legion Pro 7i 16″ | Intel Core Ultra 9 24-Core | 32GB DDR5 | 16″ 2560×1600 OLED 240Hz | $3,999.99 |
ASUS ROG Strix SCAR 16″ | Intel Core Ultra 9 275HX 24-Core | 32GB DDR5 | 16″ 2560×1600 240Hz Nebula HDR | $4,299.99 |
Gigabyte AORUS MASTER 16″ | Intel Core Ultra 9 275HX 24-Core | 32GB DDR5 | 16″ 2560×1600 OLED 240Hz | $4,299.99 |
Gigabyte AORUS MASTER 18″ | Intel Core Ultra 9 275HX 24-Core | 64GB DDR5 | 18″ 2560×1600 Mini-LED 240Hz | $4,399.99 |
ASUS ROG Strix SCAR 18″ | Intel Core Ultra 9 275HX 24-Core | 32GB DDR5 | 18″ 2560×1600 240Hz Nebula HDR | $4,499.99 |
MSI Raider A18 HX A9W 18″ | AMD Ryzen 9 9955HX3D 16-Core | 64GB DDR5 | 18″ 3840×2400 Mini-LED 120Hz | $4,899.00 |
MSI Raider 18 HX AI 18″ | Intel Core Ultra 9 285HX 24-Core | 64GB DDR5 | 18″ 3840×2400 Mini-LED 120Hz | $4,899.00 |
However, if raw performance-per-dollar is your priority, a desktop RTX 3090 ($800 secondhand in March 2025) paired with a Ryzen 5 7600 build offers better value for local LLM inference for around $1500.
MacBook vs. PC Laptop for LLMs
MacBook Pro M4 Max (128GB Unified Memory) provides exceptional context window capabilities, but its memory bandwidth of 546 GB/s falls short compared to the 896 GB/s available on the RTX 5090 Mobile.
While the MacBook can accommodate larger models in memory, it lags behind in raw generation speed when running 32B models, where the RTX 5090 Mobile excels. If your primary focus is handling huge models with extensive context, Apple Silicon still holds an advantage. However, for 4-bit quantized models at high tokens-per-second, the RTX 5090 Mobile proves to be the superior option.
Pricing also plays a key role – while the 128GB MacBook Pro costs around $1000 more than a laptop with an RTX 5090, the 48GB version is similarly priced. Even with 48GB of unified memory, the MacBook would still offer more context capacity for 32B models than the RTX 5090-equipped laptop, making it a strong alternative for users prioritizing large-scale inference over raw speed.
Conclusion
The RTX 5090 Mobile introduces a new category of laptop LLM performance, being the first consumer GPU in a PC laptop with 24GB VRAM. It outpaces the RTX 4090 Mobile significantly and brings 24 GB inference to a portable form factor.
However, for price-conscious LLM enthusiasts, a desktop RTX 3090 or used RTX 4090 remains the better value option.
Should You Buy It?
- Yes, if you need a high-performance mobile LLM workstation with 24GB VRAM.
- No, if you want the best performance-per-dollar – builds remain the better choice.
What do you think? Will you be considering an RTX 5090-powered laptop for LLM inference, or is desktop still king? Drop your thoughts below!