VOOZH about

URL: https://huggingface.co/Glint-Research/Glint-1.3/discussions/2

โ‡ฑ Glint-Research/Glint-1.3 ยท scaling laws


scaling laws

#2
by encryptedoreo - opened

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

I can't speak for them, but it is probably more about efficiency and rapid prototyping.

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

I like experimenting with small models, and this is a quick way to do that.
The largest model is 50M :P
The goal is to see how good a super small model can get

i completely support the premise of this but i would be very surprised if it could even form sentences

This model kinda already does

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

that's what im trying to tell him, but, he really don't want

ยท Sign up or log in to comment