VOOZH about

URL: https://huggingface.co/Glint-Research/Blink-1

⇱ Glint-Research/Blink-1 · Hugging Face


blink-1

we teach tiny neural networks to think. sometimes they surprise us. sometimes they just output zeros. blink is mostly the second one, and we love it anyway.

blink is the smallest model glint research makes. 1,087 parameters. not 1.087 billion, not 1.087 million. one thousand and eighty-seven actual numbers. on the blog it shows up as "the 1k debug model", the thing we break first so we don't break the 50M model. but it trained on 100 billion tokens of byte-level fineweb-edu, which works out to about 92 million tokens for every single parameter (100 billion over 1,087). an absurd ratio we ran on purpose to find out what a 1k model does when undertraining is no longer the excuse.

this is the finished 100B run (runs/blink_20260621_202506, status: done). the weights here are the slerp-tournament champion, the actual thing the merge produced. we did not cherry-pick a lucky checkpoint and call it the model.

at a glance

parameters 1,087
vocab 257 (256 bytes + EOS). byte model, no tokenizer
training tokens 100,000,595,968 (~100B)
hardware one RTX 5090
thinking none. cannot afford it. (see below)
status done. champion released.

what it is

  • 1,087 parameters, byte vocabulary. there is nothing to download except the weights. input and output are raw UTF-8 bytes.
  • dim=3. three. the entire residual stream is a 3-vector.
  • 1 shared transformer block, looped 2×, with a rank-2 LoRA per loop and a per-iteration embedding. the "depth" is the loop, not a layer stack (prelude=0, coda=0).
  • 1 attention head, RoPE (base 1e7), SwiGLU FFN (hidden=6), RMSNorm, tied input/output embeddings.
  • subquadratic sparse attention (SSA), 1M-context-capable wiring (block_size=32, top-2 blocks, local window 16). at 1k params it basically never needs it, but the wiring is there.
  • no thinking. the 1k budget physically cannot fit the COCONUT-style thinking LoRA. even one think step pushes it over its size class. the 1M model gets it. blink does not. documented, not forgotten.

files

file what it is
blink-1-base.pt pretrain slerp champion. the base model. ~13 KB.
blink-1-instruct.pt same model, SFT'd on ChatML + instruct slerp. formats like a chatbot. reasons like a 1k model. ~13 KB.
config.json architecture + training metadata.
infer.py run it. the whole model is in this one file. no install beyond torch, no private repo.
requirements.txt torch. that's the whole list.

each weight file is ~13 KB. you read that right. the model fits in a single network packet.

benchmarks (be kind)

same harness as the tiny-lm leaderboard. these are the actual released champion weights (the files in this repo). we did not grab the best checkpoint we could find and pretend it was the model:

metric base champion instruct champion note
WikiText-2 byte PPL 71.35 70.44 lower is better; random-ish bytes are ~256
BLiMP 52.84% 52.46% chance is 50%, so it learned some grammar
ARC-Easy (5-shot) 26.60% 26.80% chance is 25%

for context: the best single checkpoint on each metric hit 41.0 byte PPL (step 2000), 53.18% BLiMP (step 4000), and 26.0% ARC (step 8000). no single checkpoint held all three at once. the champion is the slerp merge that gives up a little WikiText PPL to keep grammar and reasoning above chance everywhere. we report the champion because the champion is what you download. the recipe, for the record:

gen1_gen0_step_00004000+step_00025542@t0.50 + gen0_step_00004000+step_00010000@t0.65 @t0.80

honesty clause, house style: WikiText byte PPL rises during training (the step-2000 41.0 degrades toward ~82 by the end). a 3-dimensional residual stream cannot hold both "predict the next byte of wikipedia" and "predict the next byte of filtered web text" at once, so as it commits to the training distribution it forgets the eval one. this is just what 1k parameters does, and we are not going to pretend otherwise. BLiMP and ARC staying above chance is, frankly, more than we expected.

don't trust the numbers? the harness and the weights are in the same place you got this from. run it yourself. you have a colab. we'll wait.

training

tokens 100,000,595,968 (~100B)
quality filter fineweb-edu int_score >= 25th pct (--blink, 25% quality)
steps 23,842 (+ SFT + slerp tournament)
seq len 4096
optimizer NAdamW, WSD schedule, peak LR 1.5e-3
hardware single RTX 5090
final train loss 2.809
throughput ~6.7 steps/s

usage

blink is a byte model: input and output are raw UTF-8 bytes (vocab 257 = 256 bytes + EOS=256). there is no tokenizer to load. the only thing you need is torch and infer.py. the entire model is in that one file, so nobody has to clone our private repo to run a 13 KB model.

the easy way

pip install torch
python infer.py # base, default prompts
python infer.py --instruct # instruct (ChatML) variant
python infer.py --prompt "the " -n 120

in python

infer.py exposes the loader and the sampler directly. import them:

from infer import load_model, generate

# load_model returns (model, ModelConfig); already .eval()
model, config = load_model("blink-1-base.pt")
print("params:", sum(p.numel() for p in model.parameters())) # -> 1087

# recommended base decoding. seed just makes it reproducible.
for prompt in ["the ", "I think ", "once upon "]:
 print(repr(generate(model, prompt, temperature=0.5, top_k=5,
 repetition_penalty=1.1, seed=0)))

# instruct variant speaks ChatML and wants tighter decoding:
chat, _ = load_model("blink-1-instruct.pt")
prompt = "<|im_start|>user\nhello<|im_end|>\n<|im_start|>assistant\n"
print(repr(generate(chat, prompt, temperature=0.2, top_k=10,
 repetition_penalty=1.5, seed=0)))

recommended decoding (what we use): base wants temperature=0.5, top_k=5, repetition_penalty=1.1; instruct wants temperature=0.2, top_k=10, repetition_penalty=1.5.

what you actually get back

honesty clause again. this is real output from the released base weights with the settings above (8 loops through the shared block, seed 0):

the -> 'ar n n c tiseos t at or areeeat ton al teat t t t sin s t n n t s at'
I think -> 'reos c let neas t at n t an t t ton ar teat t t t sin s t r n t s at'
once upon -> 'reat ritet nesan rat n t n t at the al teat t t t sin s t n rat s at'

it is 1,087 parameters. it gets the spaces about right and occasionally lands a real fragment ("the", "at", "let"). coherent words are the exception. that is the point.

raw load, if you insist

the model classes (ModelConfig, Blink) live in infer.py too, if you want to poke at the weights yourself:

import torch
ck = torch.load("blink-1-base.pt", map_location="cpu", weights_only=False)
state_dict = ck["model"]
config = ck["model_config"] # dim=3, shared_loops=2, ...
max_loops = state_dict["loop_embed.weight"].shape[0] # the loop count lives in the weights (=8)
# from infer import ModelConfig, Blink
# model = Blink(ModelConfig(**config), max_loops); model.load_state_dict(state_dict)

limitations

  • it is 1,087 parameters. it does not know facts. it barely knows words.
  • coherent multi-word output is the exception, not the rule.
  • WikiText perplexity degrades during training (see above).
  • english/byte only. no multilingual claims, no code claims, no claims of any kind really.
  • do not deploy this for anything. deploy it for fun.

why

we did not build blink to be good. we built it to find the floor. blink is the floor probe. if 92M tokens-per-parameter can't push a 1k model past chance on grammar, that tells you something real about the architecture. it pushed past chance. that tells you something too.

credits

glint research. lane, enderchefcoder, eclipse-senpai, armand0e, pedrodev2026, king-dahmanus, datdanboi25. built with help from fable and a couple other models, plus human brains. four people in a trench coat pretending to be a lab.

/lane glint research, 2026. a 13 KB model, 100 billion tokens, one 5090, and no shame about any of it.

Downloads last month
18

Dataset used to train Glint-Research/Blink-1

Collection including Glint-Research/Blink-1

Evaluation results

  • Released-champion WikiText-2 byte PPL on WikiText-2
    self-reported
    71.350
  • Released-champion BLiMP on WikiText-2
    self-reported
    52.840
  • Released-champion ARC-Easy (5-shot) on WikiText-2
    self-reported
    26.600