VOOZH about

URL: https://huggingface.co/allura-org/Koto-22B-PT

⇱ allura-org/Koto-22B-PT · Hugging Face


Koto 22B (Pretrained)

👁 image/png

Koto-22B-PT is a depth-upscaled version of Mistral-Nemo-Base-2407, healed and trained on almost a billion tokens of creative writing data.

Usage

This model is not intended for use outside of raw text completion settings, such as cowriting. Instruct will not work. Multi-turn roleplay will not work.

It was trained at 32k, but as not all samples were this long, we expect that in the best case you can get ~16k effective context.

We found that 1.5-1.55 temperature and 0.05-0.1 min_p worked best, but YMMV!

Datasets

Some of the data used to train this model includes:

  • Most of The Anarchist Library, a repository for anarchist manifestos and writing (see allura-org/the-anarchist-library)
  • A random sample of public domain books from Project Gutenberg
  • Furry (anthro and feral) storytelling and smut
  • A small subset of known high-quality books and story data

Acknowledgements

  • thank you to @takeshimaxfj on twitter for drawing the art used in the model card!
  • thank you very much to mango/deltavector for providing the compute used to train this model
  • thanks to curse for testing, ideas
  • thanks to toasty for some data, ideas
  • thanks to everyone else in allura for moral support

ilya <3

Call for Help

if you would like to help build on this model (instruct/RP SFT, further annealing on higher quality data, etc)...
please join our discord or our matrix! <3

Technical Appendix

Downloads last month
6
Safetensors
Model size
22B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-org/Koto-22B-PT

Finetuned
(94)
this model
Finetunes
1 model
Quantizations
7 models

Paper for allura-org/Koto-22B-PT