VOOZH about

URL: https://huggingface.co/docling-project/CodeFormulaV2

⇱ docling-project/CodeFormulaV2 · Hugging Face


Code Formula Model

The Code Formula Model processes an image of a code snippet or formula at 120 DPI and outputs its content.

  • Code Snippets:
    The model identifies the programming language and outputs the code repsecting the indendation shown in the given image. The output format will be:
    "<_<programming language>_> <content of the image>"
    Example:
    "<_Java_> System.out.println("Hello World.");"

  • Formulas:
    The model generates the corresponding LaTeX code.

This model was trained using the following two datasets:

  1. https://huggingface.co/datasets/ds4sd/SynthFormulaNet
  2. https://huggingface.co/datasets/ds4sd/SynthCodeNet

References

@techreport{Docling,
 author = {Deep Search Team},
 month = {8},
 title = {{Docling Technical Report}},
 url={https://arxiv.org/abs/2408.09869},
 eprint={2408.09869},
 doi = "10.48550/arXiv.2408.09869",
 version = {1.0.0},
 year = {2024}
}

@article{nassar2025smoldocling,
 title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion},
 author={Nassar, Ahmed and Marafioti, Andres and Omenetti, Matteo and Lysak, Maksym and Livathinos, Nikolaos and Auer, Christoph and Morin, Lucas and de Lima, Rafael Teixeira and Kim, Yusik and Gurbuz, A Said and others},
 journal={arXiv preprint arXiv:2503.11576},
 year={2025}
}
Downloads last month
55,225
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for docling-project/CodeFormulaV2

Quantizations
2 models

Datasets used to train docling-project/CodeFormulaV2

Collection including docling-project/CodeFormulaV2

Papers for docling-project/CodeFormulaV2