🌟 Buying me coffee is a direct way to show support for this project. 👁 Image

smol_llama-4x220M-MoE

smol_llama-4x220M-MoE is a Mixure of Experts (MoE) made with the following models using LazyMergekit:

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Isotonic/smol_llama-4x220M-MoE"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
 "text-generation",
 model=model,
 model_kwargs={"torch_dtype": torch.bfloat16},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

🧩 Configuration

experts:
 - source_model: BEE-spoke-data/smol_llama-220M-openhermes
 positive_prompts:
 - "reasoning"
 - "logic"
 - "problem-solving"
 - "critical thinking"
 - "analysis"
 - "synthesis"
 - "evaluation"
 - "decision-making"
 - "judgment"
 - "insight"

 - source_model: BEE-spoke-data/beecoder-220M-python
 positive_prompts:
 - "program"
 - "software"
 - "develop"
 - "build"
 - "create"
 - "design"
 - "implement"
 - "debug"
 - "test"
 - "code"
 - "python"
 - "programming"
 - "algorithm"
 - "function"

 - source_model: BEE-spoke-data/zephyr-220m-sft-full
 positive_prompts:
 - "storytelling"
 - "narrative"
 - "fiction"
 - "creative writing"
 - "plot"
 - "characters"
 - "dialogue"
 - "setting"
 - "emotion"
 - "imagination"
 - "scene"
 - "story"
 - "character"
 
 - source_model: BEE-spoke-data/zephyr-220m-dpo-full
 positive_prompts:
 - "chat"
 - "conversation"
 - "dialogue"
 - "discuss"
 - "ask questions"
 - "share thoughts"
 - "explore ideas"
 - "learn new things"
 - "personal assistant"
 - "friendly helper"

Downloads last month: 82

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for Isotonic/smol_llama-4x220M-MoE

Finetunes

1 model

Quantizations

2 models

Datasets used to train Isotonic/smol_llama-4x220M-MoE

Collection including Isotonic/smol_llama-4x220M-MoE

A collection of merged models. • 11 items • Updated Sep 25, 2024 • 2