UD-parsers
AI & ML interests
Web as a corpus, Large Language Models, Machine Translation, Language Technologies, Natural Language Processing, Internet Archive, CommonCrawl
Recent Activity
Continually pre-trained models
Language-specific LLMs continually pre-trained from fully open English base models
HPLT 3.0 T5 models
monolingual encoder-decoder language models
2508-wds
Llama-2b ablation models released as part of HPLT 3.0
HPLT 2.0 Uni-Direction Translation Models
HPLT's MT releases. https://github.com/hplt-project/HPLT-MT-Models
Multilingual Translation Models
Translation models trained on OPUS data including HPLT datasets
Large Language Models
MaLA-LM
MaLA-LM: Massive Language Adaptation of Large Language Models
HPLT 3.0 GPT-BERT models
monolingual GPT-BERT language models
2508-datasets
Llama-2b ablation models released as part of HPLT 3.0
2505-deduplication
Llama-2b ablation models released as part of HPLT 3.0
HPLT 2.0 Monolingual reference models
This is a collection of decoder-only language models trained on HPLT2.0_cleaned.
HPLT 2.0 Bert models
monolingual encoder-only language models
HPLT 1.2 Uni-Direction Translation Models
HPLT's MT releases. https://github.com/hplt-project/HPLT-MT-Models
HPLT 1.2 Bert models
monolingual encoder-only language models
UD-parsers
HPLT 3.0 GPT-BERT models
monolingual GPT-BERT language models
Continually pre-trained models
Language-specific LLMs continually pre-trained from fully open English base models
2508-datasets
Llama-2b ablation models released as part of HPLT 3.0
HPLT 3.0 T5 models
monolingual encoder-decoder language models
2505-deduplication
Llama-2b ablation models released as part of HPLT 3.0
2508-wds
Llama-2b ablation models released as part of HPLT 3.0
HPLT 2.0 Monolingual reference models
This is a collection of decoder-only language models trained on HPLT2.0_cleaned.
HPLT 2.0 Uni-Direction Translation Models
HPLT's MT releases. https://github.com/hplt-project/HPLT-MT-Models
HPLT 2.0 Bert models
monolingual encoder-only language models
Multilingual Translation Models
Translation models trained on OPUS data including HPLT datasets
HPLT 1.2 Uni-Direction Translation Models
HPLT's MT releases. https://github.com/hplt-project/HPLT-MT-Models
Large Language Models
HPLT 1.2 Bert models
monolingual encoder-only language models
MaLA-LM
MaLA-LM: Massive Language Adaptation of Large Language Models
