Voozh

AI & ML interests

Natural Language Processing, Machine Learning, and Computer Vision

Recent Activity

👁 Image

k-m-irfan updated a dataset 9 days ago

MBZUAI/longshot-bench

👁 Image

SarfrazAhmad739 updated a dataset 20 days ago

MBZUAI/TABVERSE

👁 Image

hasaniqbal777 updated a dataset 21 days ago

MBZUAI/UrduMMLU

View all activity

Papers

👁 Image

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

👁 Image

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

View all Papers

MBZUAI 's collections 22

Arabic Sentence Segmentation Shared Task 2026

Arabic sentence segmentation datasets for the Arabic Sentence Segmentation Shared Task 2026

MediX-R1

Open Ended Medical Reinforcement Learning

OpenEarthAgent

The OpenEarthAgent Collection brings together the OpenEarthAgent model and its accompanying large-scale tool-augmented geospatial reasoning data.

Video-R2

VideoMathQA

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Viewer • Updated Jun 6, 2025 • 2.1k • 535 • 11

CASS

Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels

BiMediX2

BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities

VideoGPT+

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Video-ChatGPT

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos.

PALO

PALO: A Polyglot Large Multimodal Model for 5B People

GeoChat

GeoChat is the first grounded Large Vision Language Model, specifically tailored to Remote Sensing(RS) scenarios.

Video-CoM

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

DeepfakeJudge

A framework for deepfake detection and reasoning supervision at scale.

MedMO

Medical Foundation Model

FinMMEval Lab @CLEF'2026

Training datasets for FinMMEval Lab @CLEF'2026

NADI 2025 Sub-task 3 datasets

Official training and dev datasets for NADI 2025 Subtask 3 (Diacritic Restoration) Shared Task

GeoPixel

Pixel Grounding Large Multimodal Model in Remote Sensing

ArTST - Arabic Text Speech Transformer

Open source project for Arabic Speech Recognition and Generation

GLaMM

Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated.

LLaVA++ (LLaMA-3 and Phi-3-Mini)

Extending Visual Capabilities of LLaVA with LLaMA-3 and Phi-3

MobiLlama

Collection of MobiLlama Language Models.

Satmae++

Collection of ViT models trained using SatMAE++ approach.

Arabic Sentence Segmentation Shared Task 2026

Arabic sentence segmentation datasets for the Arabic Sentence Segmentation Shared Task 2026

Video-CoM

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

MediX-R1

Open Ended Medical Reinforcement Learning

DeepfakeJudge

A framework for deepfake detection and reasoning supervision at scale.

OpenEarthAgent

The OpenEarthAgent Collection brings together the OpenEarthAgent model and its accompanying large-scale tool-augmented geospatial reasoning data.

MedMO

Medical Foundation Model

Video-R2

FinMMEval Lab @CLEF'2026

Training datasets for FinMMEval Lab @CLEF'2026

VideoMathQA

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Viewer • Updated Jun 6, 2025 • 2.1k • 535 • 11

NADI 2025 Sub-task 3 datasets

Official training and dev datasets for NADI 2025 Subtask 3 (Diacritic Restoration) Shared Task

CASS

Large-scale dataset and model suite for cross-architecture GPU code transpilation between CUDA and HIP at both source and assembly levels

GeoPixel

Pixel Grounding Large Multimodal Model in Remote Sensing

BiMediX2

BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities

ArTST - Arabic Text Speech Transformer

Open source project for Arabic Speech Recognition and Generation

VideoGPT+

VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

GLaMM

Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated.

Video-ChatGPT

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos.

LLaVA++ (LLaMA-3 and Phi-3-Mini)

Extending Visual Capabilities of LLaVA with LLaMA-3 and Phi-3

PALO

PALO: A Polyglot Large Multimodal Model for 5B People

MobiLlama

Collection of MobiLlama Language Models.

GeoChat

GeoChat is the first grounded Large Vision Language Model, specifically tailored to Remote Sensing(RS) scenarios.

Satmae++

Collection of ViT models trained using SatMAE++ approach.

URL: https://huggingface.co/MBZUAI/collections

⇱ MBZUAI (Mohamed Bin Zayed University of Artificial Intelligence)

AI & ML interests

Recent Activity

Papers

MBZUAI 's collections 22

MediX-R1 Medical AI Demo

ArtstTTS

ArtstASR

LLaVA++ (LLaMA-3-V)

LLaVA++ (Phi-3-V)

MediX-R1 Medical AI Demo

ArtstTTS

ArtstASR

LLaVA++ (LLaMA-3-V)

LLaVA++ (Phi-3-V)