VOOZH about

URL: https://huggingface.co/tartuNLP/mmBERT-small-m-edu-classifier

⇱ tartuNLP/mmBERT-small-m-edu-classifier · Hugging Face


Multilingual Educational Content Classifier

Trained on full documents of up to 8192 tokens in total. The train set of tartuNLP/fineweb-c-combined-resample was used, which itself is a mix and a resample of HuggingFaceFW/fineweb-edu-llama3-annotations and data-is-better-together/fineweb-c.

Labels

{0: '❗ Problematic Content ❗', 1: 'None', 2: 'Minimal', 3: 'Basic', 4: 'Good', 5: 'Excellent'}

Classification Report

Evaluated on the development set of tartuNLP/fineweb-c-combined-resample organized so that each language appears at least once.

 precision recall f1-score support

 0 0.89 0.78 0.83 602
 1 0.65 0.88 0.75 916
 2 0.41 0.29 0.34 345
 3 0.40 0.30 0.34 179
 4 0.53 0.15 0.23 127
 5 0.55 0.39 0.45 44

 accuracy 0.66 2213
 macro avg 0.57 0.46 0.49 2213
weighted avg 0.65 0.66 0.64 2213

Confusion Matrix

[[471 114 10 6 0 1]
 [ 33 806 59 13 5 0]
 [ 10 204 101 28 2 0]
 [ 7 72 37 53 8 2]
 [ 7 35 27 28 19 11]
 [ 2 7 10 6 2 17]]
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
BF16
·

Model tree for tartuNLP/mmBERT-small-m-edu-classifier

Finetuned
(39)
this model

Datasets used to train tartuNLP/mmBERT-small-m-edu-classifier