VOOZH

URL: https://huggingface.co/nthakur/collections

⇱ nthakur (Nandan Thakur)

👁 Nandan Thakur's picture

Nandan Thakur

nthakur

👁 Image
dvilasuero's profile picture 👁 Image
21world's profile picture 👁 Image
sam-mosaic's profile picture

·

https://thakur-nandan.github.io

AI & ML interests

NLP, IR, QA

Recent Activity

new activity 22 days ago

BeIR/fiqa:VORTEXRAG: 7-Layer RAG — EM 74.8 on QA benchmarks, solves Semantic Drift [open source]

new activity 22 days ago

BeIR/hotpotqa:VORTEXRAG: 7-Layer RAG — EM 74.8 on QA benchmarks, solves Semantic Drift [open source]

new activity 22 days ago

BeIR/nq:VORTEXRAG: 7-Layer RAG — EM 74.8 on QA benchmarks, solves Semantic Drift [open source]

View all activity

Organizations

👁 Castorini's profile picture
👁 BEIR's profile picture
👁 Tevatron's profile picture
👁 University of Waterloo's profile picture
👁 INCOME's profile picture
👁 Poison Texts's profile picture
👁 MIRACL's profile picture
👁 Vectara's profile picture
👁 TREC Retrieval-Augmented Generation's profile picture
👁 FreshStack's profile picture
👁 RLHN's profile picture
👁 Nandan's Backup's profile picture
👁 (ORBIT) Open-web Reasoning for Information Retrieval Tasks's profile picture
👁 Multilingual Agentic Search Track (MAST)'s profile picture

nthakur 's collections 5

🏜️MIRAGE-Bench [NAACL'25]

Dataset Collection from the MIRAGE-Bench paper

🌐 NoMIRACL Dataset [EMNLP'24]

A collection of multilingual relevance assessment datasets. We also have SFT fine-tuned models (Mistral-7B & Llama-3 8B)

GPL BEIR Datasets [NAACL'22]

Generative Pseudo Labeling training datasets for all domains in BEIR.

Multilingual SFT & DPO Datasets

These SFT or DPO datasets were translated from English using the Mistral-7B-Instruct-v0.2 or taken from other sources.

🦢SWIM-IR Dataset [NAACL'24]

29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs.

🏜️MIRAGE-Bench [NAACL'25]

Dataset Collection from the MIRAGE-Bench paper

Multilingual SFT & DPO Datasets

These SFT or DPO datasets were translated from English using the Mistral-7B-Instruct-v0.2 or taken from other sources.

🌐 NoMIRACL Dataset [EMNLP'24]

A collection of multilingual relevance assessment datasets. We also have SFT fine-tuned models (Mistral-7B & Llama-3 8B)

🦢SWIM-IR Dataset [NAACL'24]

29 million Synthetic Wikipedia-based Multilingual Retrieval Training Pairs.

GPL BEIR Datasets [NAACL'22]

Generative Pseudo Labeling training datasets for all domains in BEIR.