VOOZH about

URL: https://huggingface.co/qikp/pika-2

โ‡ฑ qikp/pika-2 ยท Hugging Face


pika

๐ŸŽ‰ You are looking at pika 2, which incorporates the following changes:

  • ๐Ÿ‡ฌ๐Ÿ‡ง Usage of an English dataset
  • ๐Ÿค Smaller size (8K vocabulary)
  • ๐Ÿšซ No unknown token

pika is a simple and public domain-like tokenizer.

Special Tokens

  • End-of-Sequence token: [EOS]
  • Padding token: [PAD]

Training

pika was trained on the first 6K rows of a Cosmopedia sample.

Limitations

Due to its small corpus, pika may split words into smaller pieces. Also, some uncommon special tokens aren't present, you'll have to add them manually if needed.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train qikp/pika-2

Space using qikp/pika-2 1