VOOZH about

URL: https://huggingface.co/allura-org/GLM4-32B-Neon-v2

⇱ allura-org/GLM4-32B-Neon-v2 · Hugging Face


👁 Image
Image by CalamitousFelicitousness

GLM-4-32B-0414 Neon v2

RP finetune of GLM-4-32B-0414. Feels nice, lots of personality, lots of variety, if bit quirky sometimes. Pretty smart, but sometimes plays dumb for a swipe, just let it be itself. Nice prose, not too Claude-ish or Gemini-ish. Bit of structural repetitions happen sometimes, but that's how modern LLMs are so ¯\(ツ)/¯. Seems to like JSON formatted system prompts.

Model was trained by Auri.


Training notes

Model was trained on a dataset consisting of 77M tokens of synthetic RP and short story gen data for one epoch. Training took around 28 hours on 4xRTX 3090 workstation, generously provided by OwenArli. Went with some sane defaults for training config, QLoRA plus CCE and sequence parallelism allowed to fit in 16k fit on 96GB. It overall trained smoother than 9B. I still have the issue with NaN Eval/Loss, still not sure of the reason why.

Huge thanks to ArliAI for providing compute and collaborating on this run!

Format

Model responds to GLM4 instruct formatting, exactly like it's base model. Backends struggle to add BOS token automatically, so you'll need to do it yourself. Jinja template should work for chat completions.

[gMASK]<sop><|system|>
{system_prompt}<|user|>
{prompt}<|assistant|>

Recommended Samplers

Nothing special, just classics.

Temperature - 1
Min-P - 0.1
Repetition Penalty - 1.03

Example master import for SillyTavern (using Shingane-v1 system prompt by Steelskull)

Running on KoboldCPP and other backends

To run GGUFs correctly, you need the most recent version of KoboldCPP, and to pass --overridekv glm4.rope.dimension_count=int:64 to the CLI command or put glm4.rope.dimension_count=int:64 into overridekv box in the GUI (under the Tokens tab at the very bottom).

Thanks to DaringDuck and tofumagnate for info how to apply this fix.

Should work OOTB on vLLM >=0.8.5.

ExLLaMAv2 currently doesn't properly support GLM-4-32B, unlike 9B. EXL3 should work, but it's untested.

Latest versions of llama.cpp server should also allow running GGUFs out-of-the-box.


Special Thanks

Once again, huge kudos to OwenArli for providing compute and helping with tuning along the way!

Big thanks to Artus for providing free inference for pre-release showcase of this model!

And big thanks to BeaverAI community for giving feedback and helping to figure out optimal settings!


Training config

Downloads last month
14
Safetensors
Model size
33B params
Tensor type
BF16
·

Model tree for allura-org/GLM4-32B-Neon-v2

Merges
2 models
Quantizations
5 models

Dataset used to train allura-org/GLM4-32B-Neon-v2