VOOZH about

URL: https://huggingface.co/prithivMLmods/Speech-Emotion-Classification

โ‡ฑ prithivMLmods/Speech-Emotion-Classification ยท Hugging Face


๐Ÿ‘ 1.png

Speech-Emotion-Classification

Speech-Emotion-Classification is a fine-tuned version of facebook/wav2vec2-base-960h for multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes the Wav2Vec2ForSequenceClassification architecture to accurately classify speaker emotions from audio signals.

Wav2Vec2: Self-Supervised Learning for Speech Recognition https://arxiv.org/pdf/2006.11477

Classification Report:

 precision recall f1-score test_support

 Anger 0.8314 0.9346 0.8800 306
 Calm 0.7949 0.8857 0.8378 35
 Disgust 0.8261 0.8287 0.8274 321
 Fear 0.8303 0.7377 0.7812 305
 Happy 0.8929 0.7764 0.8306 322
 Neutral 0.8423 0.9303 0.8841 287
 Sad 0.7749 0.7825 0.7787 308
 Surprised 0.9478 0.9478 0.9478 115

 accuracy 0.8379 1999
 macro avg 0.8426 0.8530 0.8460 1999
weighted avg 0.8392 0.8379 0.8367 1999

๐Ÿ‘ download.png

๐Ÿ‘ download (1).png


Label Space: 8 Classes

Class 0: Anger 
Class 1: Calm 
Class 2: Disgust 
Class 3: Fear 
Class 4: Happy 
Class 5: Neutral 
Class 6: Sad 
Class 7: Surprised

Install Dependencies

pip install gradio transformers torch librosa hf_xet

Inference Code

import gradio as gr
from transformers import Wav2Vec2ForSequenceClassification, Wav2Vec2FeatureExtractor
import torch
import librosa

# Load model and processor
model_name = "prithivMLmods/Speech-Emotion-Classification"
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)
processor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)

# Label mapping
id2label = {
 "0": "Anger",
 "1": "Calm",
 "2": "Disgust",
 "3": "Fear",
 "4": "Happy",
 "5": "Neutral",
 "6": "Sad",
 "7": "Surprised"
}

def classify_audio(audio_path):
 # Load and resample audio to 16kHz
 speech, sample_rate = librosa.load(audio_path, sr=16000)

 # Process audio
 inputs = processor(
 speech,
 sampling_rate=sample_rate,
 return_tensors="pt",
 padding=True
 )

 with torch.no_grad():
 outputs = model(**inputs)
 logits = outputs.logits
 probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()

 prediction = {
 id2label[str(i)]: round(probs[i], 3) for i in range(len(probs))
 }

 return prediction

# Gradio Interface
iface = gr.Interface(
 fn=classify_audio,
 inputs=gr.Audio(type="filepath", label="Upload Audio (WAV, MP3, etc.)"),
 outputs=gr.Label(num_top_classes=8, label="Emotion Classification"),
 title="Speech Emotion Classification",
 description="Upload an audio clip to classify the speaker's emotion from voice signals."
)

if __name__ == "__main__":
 iface.launch()

Original Label

 "id2label": {
 "0": "ANG",
 "1": "CAL",
 "2": "DIS",
 "3": "FEA",
 "4": "HAP",
 "5": "NEU",
 "6": "SAD",
 "7": "SUR"
 },

Intended Use

Speech-Emotion-Classification is designed for:

  • Speech Emotion Analytics โ€“ Analyze speaker emotions in call centers, interviews, or therapeutic sessions.
  • Conversational AI Personalization โ€“ Adjust voice assistant responses based on detected emotion.
  • Mental Health Monitoring โ€“ Support emotion recognition in voice-based wellness or teletherapy apps.
  • Voice Dataset Curation โ€“ Tag or filter speech datasets by emotion for research or model training.
  • Media Annotation โ€“ Automatically annotate podcasts, audiobooks, or videos with speaker emotion metadata.
Downloads last month
421
Safetensors
Model size
94.6M params
Tensor type
F32
ยท

Model tree for prithivMLmods/Speech-Emotion-Classification

Finetuned
(180)
this model
Quantizations
3 models

Dataset used to train prithivMLmods/Speech-Emotion-Classification

Spaces using prithivMLmods/Speech-Emotion-Classification 4

Collection including prithivMLmods/Speech-Emotion-Classification

Paper for prithivMLmods/Speech-Emotion-Classification