If you’re looking to get into local LLM inference, choosing the right GPU isn’t just about raw power—it’s about finding the best balance between VRAM, memory bandwidth, and price-to-performance efficiency. Unlike gaming, where factors like clock speeds and ray tracing matter, LLM workloads prioritize memory capacity and compute efficiency. A GPU that offers great LLM performance per dollar may not always be the best choice for gaming.
This analysis breaks down GeForce GPUs based on their ability to run an 8B model in 4-bit quantization (Q4_K_M) while considering MSRP vs. retail pricing in March 2025. Our key metric is tokens per second per dollar, helping those interested in local LLM inference maximize their investment without overpaying for unnecessary features.
Analysis of GeForce GPUs for LLM Inference in March 2025
We compiled benchmark data for 8B Q4_K_M inference speeds and compared it against GPU pricing. The GPUs included in this analysis cover both past and present generations, with a focus on consumer-friendly cards that enthusiasts are likely to consider.
12GB VRAM GPUs
GPU | Tokens/s | eBay | Current Retail | $/Token (eBay) | $/Token (Retail) |
---|---|---|---|---|---|
RTX 3060 | 53.12 | $250 | $330 | 4.71 | 6.21 |
RTX 3070 Ti | 68.94 | $350 | N/A | 5.08 | N/A |
RTX 4070 | 80.11 | $700 | $847 | 8.74 | 10.57 |
RTX 5070 | 100.45 | $750 | $620 | 7.47 | 6.18 |
16GB VRAM GPUs
GPU | Tokens/s | eBay | Current Retail | $/Token (eBay) | $/Token (Retail) |
---|---|---|---|---|---|
RTX 4060 Ti | 50.87 | $600 | $707 | 11.79 | 13.90 |
RTX 4070 Ti | 82.21 | $1,010 | $1,298 | 12.29 | 15.79 |
RTX 4080 | 106.22 | $1,400 | $1,750 | 13.18 | 16.47 |
RTX 5070 Ti | 114.71 | N/A | $940 | N/A | 8.19 |
RTX 5080 | 119.90 | $1,707 | $1,540 | 14.24 | 12.84 |
22GB / 24GB VRAM GPUs
GPU | Tokens/s | eBay | Current Retail | $/Token (eBay) | $/Token (Retail) |
---|---|---|---|---|---|
RTX 2080 Ti | 72.01 | $550 | N/A | 7.64 | N/A |
RTX 3090 | 101.74 | $950 | $2200 | 9.34 | 21.63 |
RTX 4090 | 130.58 | $2,300 | $3,000 | 17.62 | 22.98 |
RTX 5090 | 226.10 | $3,800 | $3,999 | 16.81 | 17.69 |
Key Observations
Best Budget High-VRAM Choice: RTX 3090
The RTX 3090 remains a solid high-VRAM budget option, offering 101.74 tokens per second and 24GB of VRAM. On eBay ($950), it delivers $9.34 per token, making it the best balance between affordability and performance. However, at its retail price of $2,200, its efficiency drops significantly ($21.63 per token).
Best Price-to-Performance for 12GB VRAM GPUs: RTX 5070
Among 12GB cards, the RTX 5070 is the best performer, pushing 100.45 tokens per second while maintaining a low cost per token ($6.18 retail, $7.47 eBay). The RTX 3060 ($6.21 retail) also remains a budget-friendly choice for those with lower power or memory constraints.
Diminishing Returns: RTX 4070 Ti & 4080
While both the RTX 4070 Ti (82.21 tokens/s) and RTX 4080 (106.22 tokens/s) offer good raw performance, their current high retail prices ($1,298 and $1,750) reduce their efficiency to $15.79 and $16.47 per token, respectively. Unless found at MSRP or lower, they are not the best value choices.
RTX 4090: Top Performance, Poor Price Efficiency
The RTX 4090 remains the most powerful consumer GPU at 130.58 tokens/s, but at $3,000 retail, its efficiency ($22.98 per token) is much worse than lower-tier options. Those who need absolute performance may justify the cost, but it’s not the best value for most users.
RTX 5000 Series: Mixed Outlook
- The RTX 5070 Ti (114.71 tokens/s, $940 retail) shows decent efficiency at $8.19 per token, making it a strong mid-tier contender.
- The RTX 5080 (119.90 tokens/s, $1,540 retail) sits at $12.84 per token, making it less attractive than the 4070 Ti and 5070 Ti.
- The RTX 5090 (226.10 tokens/s, $3,999 retail) offers top-tier performance but only moderate efficiency ($17.69 per token retail, $16.81 eBay)—better than the 4090 but not groundbreaking.
Final Takeaway
If you’re budget-conscious, the RTX 3090 (used) remains the best deal for high VRAM. For 12GB users, the RTX 5070 offers the best price-to-performance ratio. For pure performance, the RTX 4090 and 5090 dominate, but their price-per-token efficiency is low. The RTX 4070 Ti and 4080 are overpriced at current retail prices, making them weaker choices unless found at MSRP.
Final Thoughts
If you’re buying new, the RTX 4070 Ti at MSRP remains the best mainstream choice, offering solid performance at a reasonable price. If you’re looking for the best mix of price, performance, and VRAM, a used RTX 3090 should be your go-to option. The RTX 4090 is only worth considering if purchased at MSRP or slightly above, given its high retail price.
Users should keep an eye out for potential price drops on the new RTX 5000 series cards, as thee may shift the market dynamics. In the meantime, a multi-GPU setup featuring RTX 3090s remains one of the strongest contenders for LLM inference on a budget.