Voozh

scaling laws

by encryptedoreo - opened 11 days ago

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

👁 Image

Enderchef

GlintResearch org 11 days ago

I can't speak for them, but it is probably more about efficiency and rapid prototyping.

👁 Image

CompactAI

GlintResearch org 5 days ago

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

I like experimenting with small models, and this is a quick way to do that.
The largest model is 50M :P
The goal is to see how good a super small model can get

👁 Image

encryptedoreo

3 days ago

i completely support the premise of this but i would be very surprised if it could even form sentences

👁 Image

CompactAI

GlintResearch org 3 days ago

This model kinda already does

👁 Image

AxionLab-official

3 days ago

10 billion tokens on a sub 1m param model? llms aren't that overparameterised lmao, plus you have a 5090, you can easily go up to 100M params for pretrain so why don't you?

that's what im trying to tell him, but, he really don't want

· Sign up or log in to comment

URL: https://huggingface.co/Glint-Research/Glint-1.3/discussions/2

⇱ Glint-Research/Glint-1.3 · scaling laws

scaling laws