This isn't necessarily true with LoRAs - a 4090 can train/compute the alpaca dataset with LoRA in under 6 hours (it might be 3, I forget what it was).
So finetuning with LoRAs and a few other methods is fine on higher end consumer hardware like a 4090 and finishes in a reasonable amount of time - IMO definitely worth it if you're experimenting with this especially for the inference.
The base training though yeah I totally agree with you - train in the cloud, don't buy hardware when you need a month of 8x A100's or whatnot.
My perspective was for people who have other uses for them e.g. gaming or local inference.
From a pure finance standpoint you're definitely right - you should rent and not buy a dedicated card. I think you'd need a few thousand hours to break even which is a few months 24/7.
It's more that I'm building a gaming pc this summer, and I can either target 1440p (4070) for 2k or 4k for 5k (4090). If I can do a lot more with a 4090 over a 4070 it might make sense, but I know a lot of cs students use google colab these days, so I may just rely on that.
I'd seriously recommend the 4090 over the 4070 if you want to do finetuning/inference locally. And I highly recommend 64GB of ram.
The 24GB of VRAM is 100% worth it alone. If you want to do local ML stuff you _need_ that 24gb of VRAM.
64GB of ram + 24GB of vram lets you run a lot of the medium size models at decent speeds. I don't use Colab personally but AFAIK it should work fine for you if you don't want to do it locally.
Also worth noting is the newer ray tracing rendering that cyberpunk is doing. You should checkout the demos IMO it looks sick. It only runs at 18fps on a 4090 so it's only playable on a 4090 + dlss, and I'm not sure if the newer rending tech will be super achievable on any of the other cards - if that's of interest to you.
Get a 13900 or 7950X and 64 gb of ram. You can run llama 30B and 65B, slowly but surely. Play with that before buying a gpu. If you really, really see yourself getting into this, then go ahead and get a 3090 or 4090. But otherwise get a cheaper nvidia card and wait for things to develop a little more. You can still play with ML and CUDA but you'll have cash left for when 50X0s drop, and that will probably be right around when this stuff will really be getting hot (if the current plateau doesn't hold).
Llama is basically an auto-complete right now. We're celebrating baby's first steps. It's not really worth the $600-1000 jump up from cards that can run all current games 4k60.
Can you calculate that for me on a napkin? Every calculation I make for training, but certainly for inference, makes me break even after well under a year and then it’s vastly less if I buy the hardware myself.
More VRAM => larger models. IME it is absolutely worth maxing out VRAM for the significant improvement in quality, especially with LLaMA (though even with a 4090, you won't be able to run the largest 65-billion parameter model even with 4-bit quantization).
That said, I recommend renting a cloud GPU for a few hours and trying the larger models on them before buying a GPU of your own, just to see if the models meet your requirements.
I'm trying to decide whether it's worth while to splurge on a more expensive 4090 vs a 4070 or whatever.