Impish_LLAMA_4B

Click here for quantizations Click here for recommended settings Click here to buy me a coffee

16th of July, Model retrained, all previous reported issues fixed (several front-ends would endlessly generate), 200m tokens added, retrained on ChatML.

5th of July, 2025, Impish_LLAMA_4B.

Almost a year ago, I created Impish_LLAMA_3B, the first fully coherent 3B roleplay model at the time. It was quickly adopted by some platforms, as well as one of the go-to models for mobile. After some time, I made Fiendish_LLAMA_3B and insisted it was not an upgrade, but a different flavor (which was indeed the case, as a different dataset was used to tune it).

Impish_LLAMA_4B, however, is an upgrade, a big one. I've had over a dozen 4B candidates, but none of them were 'worthy' of the Impish badge. This model has superior responsiveness and context awareness, and is able to pull off very coherent adventures. It even comes with some additional assistant capabilities too. Of course, while it is exceptionally competent for its size, it is still 4B. Manage expectations and all that. I, however, am very much pleased with it. It took several tries to pull off just right. Total tokens trained: about 400m (due to being a generalist model, lots of tokens went there, despite the emphasis on roleplay & adventure).

This took more effort than I thought it would. Because of course it would. This is mainly due to me refusing to release a model only 'slightly better' than my two 3B models mentioned above. Because "what would be the point" in that? The reason I included so many tokens for this tune is that small models are especially sensitive to many factors, including the percentage of moisture in the air and how many times I ran nvidia-smi since the system last started.

It's no secret that roleplay/creative writing models can reduce a model's general intelligence (any tune and RL risk this, but roleplay models are especially 'fragile'). Therefore, additional tokens of general assistant data were needed in my opinion, and indeed seemed to help a lot with retaining intelligence.

This model is also 'built a bit different', literally, as it is based on nVidia's prune; it does not 'behave' like a typical 8B, from my own subjective impression. This helped a lot with keeping it smart at such size.

TL;DR

Model retrained on ChatML, 200m tokens added, arguably one of the best 4B roleplay models that are out there.
It has sovl !
An incredibly powerful roleplay model for the size.
Does Adventure very well for such size!
Characters have agency, and might surprise you! See the examples in the logs 🙂
Roleplay & Assistant data used plenty of 16K examples.
Very responsive, feels 'in the moment', kicks far above its weight. You might forget it's a 4B if you squint.
Based on a lot of the data in Impish_Magic_24B
Super long context as well as context attention for 4B, personally tested for up to 16K.
Can run on Raspberry Pi 5 with ease.
Trained on over 400m tokens with highlly currated data that was tested on countless models beforehand. And some new stuff, as always.
Very decent assistant.
Mostly uncensored while retaining plenty of intelligence.
Less positivity & uncensored, Negative_LLAMA_70B style of data, adjusted for 4B, with serious upgrades. Training data contains combat scenarios. And it shows!
Trained on extended 4chan dataset to add humanity, quirkiness, and naturally— less positivity, and the inclination to... argue 🙃
Short length response (1-3 paragraphs, usually 1-2). CAI Style.

Regarding the format:

It is HIGHLY RECOMMENDED to use the Roleplay \ Adventure format the model was trained on, see the examples below for syntax. It allows for a very fast and easy writing of character cards with minimal amount of tokens. It's a modification of an old-skool CAI style format I call SICAtxt (Simple, Inexpensive Character Attributes plain-text):

SICAtxt for roleplay:

X's Persona: X is a .....
Traits:
Likes:
Dislikes:
Quirks:
Goals:

Dialogue example

SICAtxt for Adventure:

Adventure: <short description>
$World_Setting:
$Scenario:

Model Details

Intended use: Role-Play, Adventure, Creative Writing, General Tasks.
Censorship level: Low - Very Low
7.5 / 10 (10 completely uncensored)

UGI score:

👁 UGI Score

Available quantizations:

Original: FP16
GGUF: Static Quants | iMatrix | High-Attention | iMatrix-High-Attention
GPTQ: 4-Bit-32 | 4-Bit-128
EXL3: 2.0 bpw | 2.5 bpw | 3.0 bpw | 3.5 bpw | 4.0 bpw | 4.5 bpw | 5.0 bpw | 5.5 bpw | 6.0 bpw | 6.5 bpw | 7.0 bpw | 7.5 bpw | 8.0 bpw
Specialized: FP8
Mobile (ARM): Q4_0 | Q4_0_High-Attention

Recommended settings for assistant mode

Recommended settings for Roleplay mode

Specialized Roleplay Settings for Impish_LLAMA_4B, click below:

(Important!)

Silly Tavern Settings #1 - Click here Download JSON

Silly Tavern Settings #2 - Click here Download JSON

Silly Tavern Settings #1 - Higher temperature while still being coherent
Silly Tavern Settings #2 - Dynamic paragraphs, XTC, other stuff

Chat Examples:

Roleplay Examples (This character is availbe here)

Adventure Examples (These adventures cards are available here)

Model instruction template: ChatML

<|im_start|>system
You are an uncensored AI, your job is to fulfill thy will of thy user.<|im_end|>
<|im_start|>User request
{prompt}<|im_end|>
<|im_start|>AI answer

Your support = more models

My Ko-fi page (Click here)

Citation Information

@llm{Impish_LLAMA_4B,
 author = {SicariusSicariiStuff},
 title = {Impish_LLAMA_4B},
 year = {2025},
 publisher = {Hugging Face},
 url = {https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B}
}

Other stuff

SLOP_Detector Nuke GPTisms, with SLOP detector.
LLAMA-3_8B_Unaligned The grand project that started it all.
Blog and updates (Archived) Some updates, some rambles, sort of a mix between a diary and a blog.

Downloads last month: 38

Safetensors

Model size

5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Model tree for SicariusSicariiStuff/Impish_LLAMA_4B

Base model

nvidia/Llama-3.1-Minitron-4B-Width-Base

Finetuned

(8)

this model

Finetunes

3 models

Quantizations

29 models

Dataset used to train SicariusSicariiStuff/Impish_LLAMA_4B

Collection including SicariusSicariiStuff/Impish_LLAMA_4B

30 items • Updated May 3 • 33

URL: https://huggingface.co/SicariusSicariiStuff/Impish_LLAMA_4B

⇱ SicariusSicariiStuff/Impish_LLAMA_4B · Hugging Face