VOOZH about

URL: https://huggingface.co/datasets/ltg/norbelebele

⇱ ltg/norbelebele · Datasets at Hugging Face


Protecting the integrity of FLORES+ (and NorBelebele) for evaluation

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this dataset content.

NorBelebele

This dataset is based on the official facebook/belebele but with corrected text passages that substantially reduce the number of Norwegian language errors. The numerous problems of Bokmål translations in FLORES have been described and corrected by Petter Mæhlum et al. in Improved Norwegian Bokmål Translations for FLORES. The corrected passages themselves are taken from the openlanguagedata/flores_plus project.

Contact

David Samuel (davisamu@ifi.uio.no)

Citation

@inproceedings{bandarkar-etal-2024-belebele,
 title = "The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants",
 author = "Bandarkar, Lucas and
 Liang, Davis and
 Muller, Benjamin and
 Artetxe, Mikel and
 Shukla, Satya Narayan and
 Husa, Donald and
 Goyal, Naman and
 Krishnan, Abhinandan and
 Zettlemoyer, Luke and
 Khabsa, Madian",
 editor = "Ku, Lun-Wei and
 Martins, Andre and
 Srikumar, Vivek",
 booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
 month = aug,
 year = "2024",
 address = "Bangkok, Thailand",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2024.acl-long.44/",
 doi = "10.18653/v1/2024.acl-long.44",
 pages = "749--775",
}
@inproceedings{maehlum-etal-2025-improved,
 title = "Improved {N}orwegian {B}okm{\r{a}}l Translations for {FLORES}",
 author = "M{\ae}hlum, Petter and
 N{\ae}ss Evensen, Anders and
 Scherrer, Yves",
 editor = "Haddow, Barry and
 Kocmi, Tom and
 Koehn, Philipp and
 Monz, Christof",
 booktitle = "Proceedings of the Tenth Conference on Machine Translation",
 month = nov,
 year = "2025",
 address = "Suzhou, China",
 publisher = "Association for Computational Linguistics",
 url = "https://aclanthology.org/2025.wmt-1.86/",
 doi = "10.18653/v1/2025.wmt-1.86",
 pages = "1124--1132",
 ISBN = "979-8-89176-341-8",
}
Downloads last month
120