![]() |
VOOZH | about |
Our daily lives are full of various audio in the form of sounds and speeches. Analyzing them is now a crucial task for numerous industries like Music, Crime Investigation, speech recognition, etc. Audio data analysis involves the exploration and interpretation of sound signals by extracting valuable insights, recognizing patterns, and making informed decisions. This multifaceted field encompasses a variety of fundamental concepts and techniques, contributing to a deeper comprehension of audio data. In this article, we will see various techniques to understand audio data.
Audio is the representation of sound as a set of electrical impulses or digital data. It is the process of converting sound into an electrical signal that may be stored, transferred, or processed. The electrical signal is subsequently transformed back into sound, which a listener may hear.
Audio is a way to communicate our lives, and it influences how we connect with one another and perceive the world around us.
For this implementation, we need an audio data file (.wav). Let's understand audio data using 'Recording.wav', which can be directly downloaded from here. You can use any audio file (.wav) at your convenience.
At first, we will import all the required Python libraries, like Librosa, NumPy, Matplotlib and SciPy.
To understand the audio data, first we need to understand the Sampling and sampling rate.
Since human voice has audible frequencies below 8 kHz, sampling speech at 16 kHz is adequate. A faster sample rate just raises the computing cost of processing these files.
In the code, we will use Librosa module to sample audio file implicitly. It will first read the audio file and then convert it into a waveform (a sequence of amplitude values) that can be processed digitally.
Output:
Sampling Rate: 48000 HzThe sampling rate for the audio used is 48000 Hz.
Output:
Listen the Audio files at Double Sampling Rate:
Output:
Output:
we will observe from the above audio that while changing the sampling rate audio are distorted.
Amplitude and Bit depth are two important concepts to understand the intensity of Audio Signal.
In the code snippet, calculates the amplitude range of the waveform. The soundlife library to stores the audio file in variable audio_data.
The code line bit_depth = audio_data.dtype.itemsize calculates the bit depth of the audio data by examining the data type of the audio_data array and finding its item size in bytes.
Output:
Amplitude Range: 0.292144775390625
Bit Depth: 8 bits
So, our audio file is in 8 bits category of bit depth.
Knowing the amplitude range and bit depth helps us to gain useful insights into the characteristics of an audio file, which is important in audio processing and analysis tasks.
A waveform is the graphical representation of of an audio signal in the time domain where each point on the waveform represents the amplitude of the audio signal at a specific point in time. It will help us to understand how the audio signal varies over time by its revealing features like sound duration, pauses and amplitude changes.
In the code snippet, we have plot waveform leveraging librosa and matplotlib.
Output:
Frequency Spectrum is a representation of how the energy in an audio signal is distributed across different frequencies which can be calculated by applying a mathematical transformation like the Fast Fourier Transform (FFT) to the audio signal. It is very useful for identifying musical notes, detecting harmonics or filtering specific frequency components.
The code snippet computes Fast Fourier Transform and plots frequency spectrum of audio using matplotlib.
Output:
The plot displays the frequency of the audio signal, which allows us to observe the dominant frequencies and their amplitudes.
Spectrogram is a time-frequency representation of an audio signal which provides a 2D visualization of how the frequency content of the audio signal changes over time.
In spectrogram, the dark regions indicate less presence of a frequency and in the other hand bright regions indicate strong presence of a frequency at a certain time. This will help is various tasks like speech recognition, musical analysis and identifying sound patterns.
The code snippet computes and plots the audio waveform using matplotlib library and scipy.signal.spectrogram function. In the code, epsilon defines a small constant to avoid division by zero. Epsilon is added to the spectrogram values before taking logarithm to prevent issues with very small values.
Output:
The plot displays spectrogram, which represents how the frequencies in the audio signal change over time. The color intensity represents whether the frequency is high or low at each time point.
We can conclude that various types of visualization tasks help us to understand the behavior of audio signals very effectively. Also understanding audio signal is very essential task for audio classification, music genre classification and speed recognition.