TinyMistral-6x248M

TinyMistral-6x248M is a Mixure of Experts (MoE) made with the following models using LazyMergekit:

The resulting model is then pre-trained on 600,000 examples of nampdn-ai/mini-peS2o.

We don't recommend using the Inference API as the model has serious performance degradation.

Recommended inference parameters

do_sample: true
temperature: 0.2
top_p: 0.14
top_k: 12
repetition_penalty: 1.15

🧩 Configuration

base_model: Locutusque/TinyMistral-248M-v2.5
experts:
 - source_model: Locutusque/TinyMistral-248M-v2
 positive_prompts:
 - "An emerging trend in global economics is"
 - "TITLE: The Next Generation of Internet Connectivity"
 - "begin a comprehensive analysis on the sociopolitical effects of"
 negative_prompts:
 - "Code a simple"
 - "Explain the Krebs cycle in detail"
 - "Compose a sonnet about"

 - source_model: Locutusque/TinyMistral-248M-v2.5
 positive_prompts:
 - "Advanced C++ memory management techniques"
 - "C# asynchronous programming best practices"
 - "AI's role in predictive analytics"
 - "textbook review on machine learning algorithms"
 - "## Exercise: Design a C# interface for a CRM system"
 - "## Solution: Optimize an AI-powered recommendation engine"
 negative_prompts:
 - "Narrate the story of"
 - "The ethical considerations in"
 - "Review the latest art exhibition by"
 
 - source_model: Locutusque/TinyMistral-248M-v2.5-Instruct
 positive_prompts:
 - "What is the chemical formula for photosynthesis?"
 - "Identification of a new mineral found on Mars"
 - "physics: Explaining the concept of relativity"
 - "Solve for x using differential equations:"
 - "history: Analyze the causes of the French Revolution"
 negative_prompts:
 - "Devise a business plan for"
 - "The evolution of culinary arts"
 - "Orchestrate a piece for a string quartet"
 
 - source_model: jtatman/tinymistral-v2-pycoder-instruct-248m
 positive_prompts:
 - "Write a Python program for facial recognition"
 - "Explain dynamic typing in programming languages"
 - "algorithm development for efficient data sorting"
 negative_prompts:
 - "Who was the first Emperor of Rome?"
 - "Discuss the political dynamics in"
 - "Provide a proof for Fermat's Last Theorem"
 - "physics: The principles of thermodynamics"
 
 - source_model: Felladrin/TinyMistral-248M-SFT-v4
 positive_prompts:
 - "Escreba sobre a influência da música no Brasil"
 - "Voici un guide pour les voyageurs en France"
 - "Para entender la política de México, se debe considerar"
 - "Cuales son los efectos de la globalización en Argentina"
 - "Welche gesellschaftlichen Veränderungen gibt es in Deutschland"
 - "If you had to imagine a utopian city, what would be its core values?"
 negative_prompts:
 - "Calculate the integral of"
 - "Describe the process of cell division"
 - "Review the latest advancements in quantum computing"

 - source_model: Locutusque/TinyMistral-248M-v2-Instruct
 positive_prompts:
 - "Write an essay on the evolution of international trade laws"
 - "What are the key components of a sustainable urban ecosystem?"
 - "instruct on effective negotiation techniques in diplomacy"
 - "How does cognitive bias affect decision making in high-pressure environments?"
 - "Identify the architectural significance of the Sydney Opera House"
 negative_prompts:
 - "Develop a script to automate"
 - "Understanding inheritance in object-oriented programming"
 - "philosophy of existentialism in contemporary society"

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "M4-ai/TinyMistral-6x248M"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
 "text-generation",
 model=model,
 model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])