ALLaM-7B-Instruct — Curated 2M
LoRA adapter for ALLaM-7B-Instruct-preview, fine-tuned on the
human-curated Arabic instruction dataset CIDAR under a fixed
budget of 2M training tokens. One of six adapters from a
controlled study comparing human-curated versus synthetic Arabic
instruction data under matched token budgets.
Model Details
- Base model: humain-ai/ALLaM-7B-Instruct-preview
- Adapter type: LoRA (QLoRA, 4-bit NF4)
- Training data: CIDAR (human-curated)
- Token budget: 2M tokens
- Language: Arabic
Training Configuration
| Setting | Value |
|---|---|
| Quantization | 4-bit NF4 (QLoRA) |
| LoRA rank / alpha | 16 / 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Optimizer | Paged AdamW (8-bit) |
| Learning rate | 2e-4 |
| LR scheduler | cosine, 100 warmup steps |
| Epochs | 3 |
| Effective batch size | 16 (2 × 8 grad. accum.) |
| Max sequence length | 512 |
| Precision | fp16 |
| Seed | 42 |
| Hardware | NVIDIA A100 (40GB) |
Evaluation
Evaluated with lm-evaluation-harness on seven Arabic benchmarks (ACVA 5-shot; others zero-shot). Accuracy:
| Benchmark | Score |
|---|---|
| Arab Culture | 0.366 |
| AlGhafa | 0.593 |
| AraDiCE | 0.587 |
| ACVA | 0.776 |
| Arabic Exams | 0.501 |
| ArabicMMLU | 0.640 |
| OpenAI MMLU (Ar) | 0.435 |
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = "humain-ai/ALLaM-7B-Instruct-preview"
tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(base, trust_remote_code=True, device_map="auto")
model = PeftModel.from_pretrained(model, "ManarAlrabie/arabic-llm-curated-2m")
Intended Use & Limitations
Research artifact for studying instruction-data quality vs. quantity in Arabic LLM fine-tuning. Not intended for production. As a single-seed fine-tune of a 7B model, outputs may contain inaccuracies.
Citation
Associated paper is under review; citation will be added upon publication. Until then, please link to this repository.
- Downloads last month
- 11
Model tree for ManarAlrabie/arabic-llm-curated-2m
Base model
humain-ai/ALLaM-7B-Instruct-preview