VOOZH about

URL: https://huggingface.co/MLRS/BERTu

⇱ MLRS/BERTu · Hugging Face


BERTu

A Maltese monolingual model pre-trained from scratch on the Korpus Malti v4.0 using the BERT (base) architecture.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

👁 CC BY-NC-SA 4.0

Citation

This work was first presented in Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese. Cite it as follows:

@inproceedings{BERTu,
 title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
 author = "Micallef, Kurt and
 Gatt, Albert and
 Tanti, Marc and
 van der Plas, Lonneke and
 Borg, Claudia",
 booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
 month = jul,
 year = "2022",
 address = "Hybrid",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2022.deeplo-1.10",
 doi = "10.18653/v1/2022.deeplo-1.10",
 pages = "90--101",
}
Downloads last month
300
Safetensors
Model size
0.1B params
Tensor type
I64
·
F32
·

Model tree for MLRS/BERTu

Finetunes
9 models

Dataset used to train MLRS/BERTu

Space using MLRS/BERTu 1

Evaluation results

  • Unlabelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)
    self-reported
    92.310
  • Labelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)
    self-reported
    88.140
  • UPOS Accuracy on MLRS POS dataset
    self-reported
    98.580
  • XPOS Accuracy on MLRS POS dataset
    self-reported
    98.540
  • Span-based F1 on WikiAnn (Maltese)
    self-reported
    86.770
  • Macro-averaged F1 on Maltese Sentiment Analysis Dataset
    self-reported
    78.960