VOOZH about

URL: https://huggingface.co/oliverguhr/fullstop-punctuation-multilingual-base

⇱ oliverguhr/fullstop-punctuation-multilingual-base · Hugging Face


Work in progress

Classification report over all languages

 precision recall f1-score support

 0 0.99 0.99 0.99 47903344
 . 0.94 0.95 0.95 2798780
 , 0.85 0.84 0.85 3451618
 ? 0.88 0.85 0.87 88876
 - 0.61 0.32 0.42 157863
 : 0.72 0.52 0.60 103789

 accuracy 0.98 54504270
 macro avg 0.83 0.75 0.78 54504270
weighted avg 0.98 0.98 0.98 54504270

How to cite us

@article{guhr-EtAl:2021:fullstop,
 title={FullStop: Multilingual Deep Models for Punctuation Prediction},
 author = {Guhr, Oliver and Schumann, Anne-Kathrin and Bahrmann, Frank and Böhme, Hans Joachim},
 booktitle = {Proceedings of the Swiss Text Analytics Conference 2021},
 month = {June},
 year = {2021},
 address = {Winterthur, Switzerland},
 publisher = {CEUR Workshop Proceedings}, 
 url = {http://ceur-ws.org/Vol-2957/sepp_paper4.pdf}
}
@misc{https://doi.org/10.48550/arxiv.2301.03319,
 doi = {10.48550/ARXIV.2301.03319},
 url = {https://arxiv.org/abs/2301.03319},
 author = {Vandeghinste, Vincent and Guhr, Oliver},
 keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7},
 title = {FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers},
 publisher = {arXiv},
 year = {2023}, 
 copyright = {Creative Commons Attribution Share Alike 4.0 International}
}
Downloads last month
1,141
Safetensors
Model size
0.3B params
Tensor type
I64
·
F32
·

Dataset used to train oliverguhr/fullstop-punctuation-multilingual-base

Paper for oliverguhr/fullstop-punctuation-multilingual-base