If you’re looking to get into local LLM inference, choosing the right GPU isn’t just about raw power—it’s about finding the best balance between VRAM, memory bandwidth, and price-to-performance efficiency. Unlike gaming, where factors like clock speeds and ray tracing matter, LLM workloads prioritize memory capacity and compute efficiency. A GPU that offers great LLM performance per dollar may not always be the best choice for gaming.

This analysis breaks down GeForce GPUs based on their ability to run an 8B model in 4-bit quantization (Q4_K_M) while considering MSRP vs. retail pricing in March 2025. Our key metric is tokens per second per dollar, helping those interested in local LLM inference maximize their investment without overpaying for unnecessary features.

Analysis of GeForce GPUs for LLM Inference in March 2025

We compiled benchmark data for 8B Q4_K_M inference speeds and compared it against GPU pricing. The GPUs included in this analysis cover both past and present generations, with a focus on consumer-friendly cards that enthusiasts are likely to consider.

12GB VRAM GPUs

GPU Tokens/s eBay Current Retail $/Token (eBay) $/Token (Retail)
RTX 3060 53.12 $250 $330 4.71 6.21
RTX 3070 Ti 68.94 $350 N/A 5.08 N/A
RTX 4070 80.11 $700 $847 8.74 10.57
RTX 5070 100.45 $750 $620 7.47 6.18

16GB VRAM GPUs

GPU Tokens/s eBay Current Retail $/Token (eBay) $/Token (Retail)
RTX 4060 Ti 50.87 $600 $707 11.79 13.90
RTX 4070 Ti 82.21 $1,010 $1,298 12.29 15.79
RTX 4080 106.22 $1,400 $1,750 13.18 16.47
RTX 5070 Ti 114.71 N/A $940 N/A 8.19
RTX 5080 119.90 $1,707 $1,540 14.24 12.84

22GB / 24GB VRAM GPUs

GPU Tokens/s eBay Current Retail $/Token (eBay) $/Token (Retail)
RTX 2080 Ti 72.01 $550 N/A 7.64 N/A
RTX 3090 101.74 $950 $2200 9.34 21.63
RTX 4090 130.58 $2,300 $3,000 17.62 22.98
RTX 5090 226.10 $3,800 $3,999 16.81 17.69

Key Observations

Best Budget High-VRAM Choice: RTX 3090

The RTX 3090 remains a solid high-VRAM budget option, offering 101.74 tokens per second and 24GB of VRAM. On eBay ($950), it delivers $9.34 per token, making it the best balance between affordability and performance. However, at its retail price of $2,200, its efficiency drops significantly ($21.63 per token).

Best Price-to-Performance for 12GB VRAM GPUs: RTX 5070

Among 12GB cards, the RTX 5070 is the best performer, pushing 100.45 tokens per second while maintaining a low cost per token ($6.18 retail, $7.47 eBay). The RTX 3060 ($6.21 retail) also remains a budget-friendly choice for those with lower power or memory constraints.

Diminishing Returns: RTX 4070 Ti & 4080

While both the RTX 4070 Ti (82.21 tokens/s) and RTX 4080 (106.22 tokens/s) offer good raw performance, their current high retail prices ($1,298 and $1,750) reduce their efficiency to $15.79 and $16.47 per token, respectively. Unless found at MSRP or lower, they are not the best value choices.

RTX 4090: Top Performance, Poor Price Efficiency

The RTX 4090 remains the most powerful consumer GPU at 130.58 tokens/s, but at $3,000 retail, its efficiency ($22.98 per token) is much worse than lower-tier options. Those who need absolute performance may justify the cost, but it’s not the best value for most users.

RTX 5000 Series: Mixed Outlook

  • The RTX 5070 Ti (114.71 tokens/s, $940 retail) shows decent efficiency at $8.19 per token, making it a strong mid-tier contender.
  • The RTX 5080 (119.90 tokens/s, $1,540 retail) sits at $12.84 per token, making it less attractive than the 4070 Ti and 5070 Ti.
  • The RTX 5090 (226.10 tokens/s, $3,999 retail) offers top-tier performance but only moderate efficiency ($17.69 per token retail, $16.81 eBay)—better than the 4090 but not groundbreaking.

Final Takeaway

If you’re budget-conscious, the RTX 3090 (used) remains the best deal for high VRAM. For 12GB users, the RTX 5070 offers the best price-to-performance ratio. For pure performance, the RTX 4090 and 5090 dominate, but their price-per-token efficiency is low. The RTX 4070 Ti and 4080 are overpriced at current retail prices, making them weaker choices unless found at MSRP.

Final Thoughts

If you’re buying new, the RTX 4070 Ti at MSRP remains the best mainstream choice, offering solid performance at a reasonable price. If you’re looking for the best mix of price, performance, and VRAM, a used RTX 3090 should be your go-to option. The RTX 4090 is only worth considering if purchased at MSRP or slightly above, given its high retail price.

Users should keep an eye out for potential price drops on the new RTX 5000 series cards, as thee may shift the market dynamics. In the meantime, a multi-GPU setup featuring RTX 3090s remains one of the strongest contenders for LLM inference on a budget.