Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

is there a difference in the quality of llm one would be able to train or run on a gpu with 8, 12, 16, all the way up to 24gb?

I'm trying to decide whether it's worth while to splurge on a more expensive 4090 vs a 4070 or whatever.



It makes 0 economical sense to buy a GPU to train a model. If you want to train a model, train it on the cloud.


This isn't necessarily true with LoRAs - a 4090 can train/compute the alpaca dataset with LoRA in under 6 hours (it might be 3, I forget what it was).

So finetuning with LoRAs and a few other methods is fine on higher end consumer hardware like a 4090 and finishes in a reasonable amount of time - IMO definitely worth it if you're experimenting with this especially for the inference.

The base training though yeah I totally agree with you - train in the cloud, don't buy hardware when you need a month of 8x A100's or whatnot.


Even with LoRA the economics are not in your favor to buy a 4090 instead of cloud training.


Actually yeah you're definitely right.

My perspective was for people who have other uses for them e.g. gaming or local inference. From a pure finance standpoint you're definitely right - you should rent and not buy a dedicated card. I think you'd need a few thousand hours to break even which is a few months 24/7.


It's more that I'm building a gaming pc this summer, and I can either target 1440p (4070) for 2k or 4k for 5k (4090). If I can do a lot more with a 4090 over a 4070 it might make sense, but I know a lot of cs students use google colab these days, so I may just rely on that.


I'd seriously recommend the 4090 over the 4070 if you want to do finetuning/inference locally. And I highly recommend 64GB of ram.

The 24GB of VRAM is 100% worth it alone. If you want to do local ML stuff you _need_ that 24gb of VRAM.

64GB of ram + 24GB of vram lets you run a lot of the medium size models at decent speeds. I don't use Colab personally but AFAIK it should work fine for you if you don't want to do it locally.

Also worth noting is the newer ray tracing rendering that cyberpunk is doing. You should checkout the demos IMO it looks sick. It only runs at 18fps on a 4090 so it's only playable on a 4090 + dlss, and I'm not sure if the newer rending tech will be super achievable on any of the other cards - if that's of interest to you.


Get a 13900 or 7950X and 64 gb of ram. You can run llama 30B and 65B, slowly but surely. Play with that before buying a gpu. If you really, really see yourself getting into this, then go ahead and get a 3090 or 4090. But otherwise get a cheaper nvidia card and wait for things to develop a little more. You can still play with ML and CUDA but you'll have cash left for when 50X0s drop, and that will probably be right around when this stuff will really be getting hot (if the current plateau doesn't hold).

Llama is basically an auto-complete right now. We're celebrating baby's first steps. It's not really worth the $600-1000 jump up from cards that can run all current games 4k60.


Fair enough.


Can you calculate that for me on a napkin? Every calculation I make for training, but certainly for inference, makes me break even after well under a year and then it’s vastly less if I buy the hardware myself.


You can infer on your local machine.

I doubt it will take you a year to train a model.

Vicuna-13B cost $300 to train/fine-tune [0]. They trained on an A100 which costs $10k [1].

[0]: https://vicuna.lmsys.org/ [1]: https://github.com/lm-sys/FastChat/blob/main/scripts/train-v...


More VRAM => larger models. IME it is absolutely worth maxing out VRAM for the significant improvement in quality, especially with LLaMA (though even with a 4090, you won't be able to run the largest 65-billion parameter model even with 4-bit quantization).

That said, I recommend renting a cloud GPU for a few hours and trying the larger models on them before buying a GPU of your own, just to see if the models meet your requirements.


But should fit easily on a Apple MBP or Studio with 96GB or 128GB of unified memory.


llama.cpp runs on your processor and uses a lot of RAM, so splurging on a GPU isn't going to help the performance




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: