๐ง Dixtral_QA โ BUT-FIT Diarization-Conditioned Voxtral for Spoken QA
This repository hosts Dixtral_QA, developed by BUT Speech@FIT. Dixtral couples the Voxtral-Mini-3B spoken-language model with the DiCoW diarization-conditioned encoder, giving the LLM target-speaker awareness in multi-talker audio.
This checkpoint is tuned for spoken question answering over conversational/meeting audio. For pure target-speaker transcription, use Dixtral_TS-ASR instead.
๐ ๏ธ Model Usage
from transformers import AutoModel, AutoProcessor
MODEL_NAME = "BUT-FIT/Dixtral_QA"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(MODEL_NAME)
โก๏ธ For full inference pipelines (diarization โ FDDT masks โ generation), see the Dixtral GitHub repository.
๐ฆ Model Details
- Base Model: Voxtral-Mini-3B-2507
- Encoder: DiCoW v3 large
- Training Datasets:
๐ฌ Contact
๐ง Email: ipoloka@fit.vut.cz ๐ข Affiliation: BUT Speech@FIT, Brno University of Technology ๐ GitHub: BUTSpeechFIT
- Downloads last month
- 41
Safetensors
Model size
5B params
Tensor type
BF16
ยท
Model tree for BUT-FIT/Dixtral_QA
Base model
mistralai/Voxtral-Mini-3B-2507