Koto 22B (Pretrained)

👁 image/png

Koto-22B-PT is a depth-upscaled version of Mistral-Nemo-Base-2407, healed and trained on almost a billion tokens of creative writing data.

Usage

This model is not intended for use outside of raw text completion settings, such as cowriting. Instruct will not work. Multi-turn roleplay will not work.

It was trained at 32k, but as not all samples were this long, we expect that in the best case you can get ~16k effective context.

We found that 1.5-1.55 temperature and 0.05-0.1 min_p worked best, but YMMV!

Datasets

Some of the data used to train this model includes:

Most of The Anarchist Library, a repository for anarchist manifestos and writing (see allura-org/the-anarchist-library)
A random sample of public domain books from Project Gutenberg
Furry (anthro and feral) storytelling and smut
A small subset of known high-quality books and story data

Acknowledgements

thank you to @takeshimaxfj on twitter for drawing the art used in the model card!
thank you very much to mango/deltavector for providing the compute used to train this model
thanks to curse for testing, ideas
thanks to toasty for some data, ideas
thanks to everyone else in allura for moral support

ilya <3

Call for Help

if you would like to help build on this model (instruct/RP SFT, further annealing on higher quality data, etc)...
please join our discord or our matrix! <3

Technical Appendix

Downloads last month: 6

Safetensors

Model size

22B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-org/Koto-22B-PT

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

(94)

this model

Finetunes

1 model

Quantizations

7 models

Paper for allura-org/Koto-22B-PT

Paper • 2312.15166 • Published Dec 23, 2023 • 62

URL: https://huggingface.co/allura-org/Koto-22B-PT

⇱ allura-org/Koto-22B-PT · Hugging Face