VOOZH about

URL: https://huggingface.co/AgentPublic/camembert-base-squadFR-fquad-piaf

⇱ AgentPublic/camembert-base-squadFR-fquad-piaf · Hugging Face


camembert-base-squadFR-fquad-piaf

Description

Question-answering French model, using base CamemBERT fine-tuned on a combo of three French Q&A datasets:

  1. PIAFv1.1
  2. FQuADv1.0
  3. SQuAD-FR (SQuAD automatically translated to French)

Training hyperparameters

python run_squad.py \
--model_type camembert \
--model_name_or_path camembert-base \
--do_train --do_eval \
--train_file data/SQuAD+fquad+piaf.json \
--predict_file data/fquad_valid.json \
--per_gpu_train_batch_size 12 \ 
--learning_rate 3e-5 \ 
--num_train_epochs 4 \ 
--max_seq_length 384 \ 
--doc_stride 128 \
--save_steps 10000 

Evaluation results

FQuAD v1.0 Evaluation

{"f1": 79.81, "exact_match": 55.14}

SQuAD-FR Evaluation

{"f1": 80.61, "exact_match": 59.54}

Usage

from transformers import pipeline

nlp = pipeline('question-answering', model='etalab-ia/camembert-base-squadFR-fquad-piaf', tokenizer='etalab-ia/camembert-base-squadFR-fquad-piaf')

nlp({
 'question': "Qui est Claude Monet?",
 'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
})

Acknowledgments

This work was performed using HPC resources from GENCI–IDRIS (Grant 2020-AD011011224).

Citations

PIAF

@inproceedings{KeraronLBAMSSS20,
 author = {Rachel Keraron and
 Guillaume Lancrenon and
 Mathilde Bras and
 Fr{\'{e}}d{\'{e}}ric Allary and
 Gilles Moyse and
 Thomas Scialom and
 Edmundo{-}Pavel Soriano{-}Morales and
 Jacopo Staiano},
 title = {Project {PIAF:} Building a Native French Question-Answering Dataset},
 booktitle = {{LREC}},
 pages = {5481--5490},
 publisher = {European Language Resources Association},
 year = {2020}
}

FQuAD

@article{dHoffschmidt2020FQuADFQ,
 title={FQuAD: French Question Answering Dataset},
 author={Martin d'Hoffschmidt and Maxime Vidal and Wacim Belblidia and Tom Brendl'e and Quentin Heinrich},
 journal={ArXiv},
 year={2020},
 volume={abs/2002.06071}
}

SQuAD-FR

 @MISC{kabbadj2018,
 author = "Kabbadj, Ali",
 title = "Something new in French Text Mining and Information Extraction (Universal Chatbot): Largest Q&A French training dataset (110 000+) ",
 editor = "linkedin.com",
 month = "November",
 year = "2018",
 url = "\url{https://www.linkedin.com/pulse/something-new-french-text-mining-information-chatbot-largest-kabbadj/}",
 note = "[Online; posted 11-November-2018]",
 }

CamemBERT

HF model card : https://huggingface.co/camembert-base

@inproceedings{martin2020camembert,
 title={CamemBERT: a Tasty French Language Model},
 author={Martin, Louis and Muller, Benjamin and Su{\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\'E}ric Villemonte and Seddah, Djam{\'e} and Sagot, Beno{\^\i}t},
 booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
 year={2020}
}
Downloads last month
793
Safetensors
Model size
0.1B params
Tensor type
F32
·

Model tree for AgentPublic/camembert-base-squadFR-fquad-piaf

Finetunes
2 models

Dataset used to train AgentPublic/camembert-base-squadFR-fquad-piaf

Spaces using AgentPublic/camembert-base-squadFR-fquad-piaf 3