A collection of Indonesian NLP models for various text classification tasks such as spam detection, hate speech, abusive language, and more. Suitable • 8 items • Updated
Indonesian Regional Languages Identifier
Fine-tuned XLM-RoBERTa model for identifying 11 Indonesian regional languages + English.
Supported Languages
- 🇮🇩 Indonesian (Bahasa Indonesia)
- Acehnese (Bahasa Aceh)
- Balinese (Basa Bali)
- Banjarese (Bahasa Banjar)
- Buginese (Basa Ugi)
- Javanese (Basa Jawa)
- Madurese (Basa Madhura)
- Minangkabau (Baso Minang)
- Ngaju (Basa Ngaju)
- Sundanese (Basa Sunda)
- Toba Batak (Hata Batak Toba)
- 🇬🇧 English
Model Performance
- Accuracy: 0.9783
- F1 Macro: 0.9783
- F1 Weighted: 0.9783
- Precision: 0.9785
- Recall: 0.9783
Usage
from transformers import pipeline
# Load model
classifier = pipeline("text-classification", model="YOUR_USERNAME/xlm-roberta-indonesian-languages")
# Single prediction
result = classifier("Sugeng enjing, piye kabare?")
print(result)
# Output: [{'label': 'javanese', 'score': 0.9876}]
# Batch prediction
texts = [
"Selamat pagi, apa kabar?",
"Wilujeng enjing, kumaha damang?",
"Good morning, how are you?"
]
results = classifier(texts)
for text, result in zip(texts, results):
print(f"{text} -> {result['label']} ({result['score']:.4f})")
Training Details
- Base Model: xlm-roberta-base
- Training Samples: 6000
- Validation Samples: 1200
- Epochs: 5
- Learning Rate: 2e-05
- Batch Size: 16
- Training Date: 20251124_070409
Citation
If you use this model, please cite:
@misc{indonesian-language-id,
author = {Raihan Hidayatullah Djunaedi},
title = {Indonesian Regional Languages Identifier},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/nahiar/xlm-roberta-indonesian-languages}
}
- Downloads last month
- 101
Safetensors
Model size
0.3B params
Tensor type
F32
·
Model tree for nahiar/xlm-roberta-indonesian-languages
Base model
FacebookAI/xlm-roberta-base