VOOZH about

URL: https://huggingface.co/BUT-FIT/Dixtral_QA

โ‡ฑ BUT-FIT/Dixtral_QA ยท Hugging Face


๐Ÿง  Dixtral_QA โ€” BUT-FIT Diarization-Conditioned Voxtral for Spoken QA

This repository hosts Dixtral_QA, developed by BUT Speech@FIT. Dixtral couples the Voxtral-Mini-3B spoken-language model with the DiCoW diarization-conditioned encoder, giving the LLM target-speaker awareness in multi-talker audio.

This checkpoint is tuned for spoken question answering over conversational/meeting audio. For pure target-speaker transcription, use Dixtral_TS-ASR instead.

๐Ÿ› ๏ธ Model Usage

from transformers import AutoModel, AutoProcessor

MODEL_NAME = "BUT-FIT/Dixtral_QA"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(MODEL_NAME)

โžก๏ธ For full inference pipelines (diarization โ†’ FDDT masks โ†’ generation), see the Dixtral GitHub repository.


๐Ÿ“ฆ Model Details


๐Ÿ“ฌ Contact

๐Ÿ“ง Email: ipoloka@fit.vut.cz ๐Ÿข Affiliation: BUT Speech@FIT, Brno University of Technology ๐Ÿ”— GitHub: BUTSpeechFIT

Downloads last month
41
Safetensors
Model size
5B params
Tensor type
BF16
ยท

Model tree for BUT-FIT/Dixtral_QA

Finetuned
(19)
this model

Datasets used to train BUT-FIT/Dixtral_QA