Hands-on Tutorials
Recently found this book Music by the Numbers by Eli Maor and inspired by a conversation with a friend who is interested in learning a musical instrument, I thought it would be fun to visualize some mathematical and musical concepts using Python.
I am by no means a gifted musician, nor am I an expert in signal processing, but I hope this post will help you understand some mathematical concepts using music and appreciate music more from a different perspective. The book mentioned above is a fun read (for one, I didn’t know Euler was tutored by Johann Bernoulli) and this post aims to provide some visual and acoustic aid using Python.
What is Sound?
How to make a sound? You can clap, knock or sing to make a sound and all of those actions cause some form of vibration. The vibration propagates through a medium (air, water, etc.) as a wave into our ears. However, we cannot hear all vibration because some frequencies do not fall within our hearing range. We perceive higher frequency as higher pitch and that different music notes have different frequencies. Below shows the sound waves of middle C and the D just above it.
As you can see it takes less time for the D wave to finish one cycle, hence having higher frequency.
Making a Sound in Python
In order for the sound to have a particular pitch, we need to know the frequency. Wikipedia has a great table mapping a key on the piano to a frequency. In general, if we use the convention that the A above middle C has a frequency of 440Hz (the much debated concert pitch), we can derive any note using this formula:
where n is the rank of the key (A4 is the 49th key).
This post has great example code on how to calculate note frequency and make a sound wave. The code here is quite similar, just expanded to 88 keys instead of a single octave:
The function above returns a dictionary that maps a note name to corresponding frequency in hertz. Now we want to make a wave of middle C:
Here duration is in number of seconds, sample rate determines the quality of sound, and amplitude determines the volume. A sample rate of 44.1KHz (44,100 samples per second) is quite common for consumer audio.
Using the two functions above and Scipy’s wavfile module, we can create a .wav file of middle C.
And here’s the output audio.
Making Realistic Sound
The audio above is pretty boring to say the least and no one in the right mind would consider it music. It’s lacking many characteristics that we associate sound from musical instruments with and it’s too consistent throughout. First let’s compare that with a middle C played on an actual piano.
The same note sounds different on a piano than on a violin because of timbre (tone quality or characteristic). When we press a key on the piano, it doesn’t just produce one sound (or one wave) but many others of what we call overtones (harmonics). Different instruments produce different overtones and hence creating timbre which allows us to distinguish one instrument from another even while playing the same note.
Once again using Scipy, we will load the piano audio and plot the sound wave using Matplotlib.
That looks nothing like the pure sine wave in the first section at all even though they are supposed to be of the same note! It’s not as smooth and the magnitude changes.
The wave is not "smooth" because of the existence of overtones. The sound wave above is actually a combination of waves with frequencies that are multiples (including 1, for the note itself) of the middle C. The middle C in this case is our fundamental note. What note are those overtones? Turns out the C one octave above (C5) has frequency exactly two times that of the middle C’s, so that’s one of our candidates. In fact, all the C’s have frequencies that are multiples of one another. But when we press the middle C key it doesn’t sound like we press all the C’s together at the same time. This is because the overtones are a lot more fainted than the fundamental. To mimic the sound of a piano, we will need to know how to apply those overtones to the fundamental.
One common technique in signal processing that separates signals of different frequencies from one input signal (our sound wave is a discrete version of a signal) is the Fourier transform. Here’s a beautifully visualized video by 3blue1brown on Fourier transform if you are not familiar with it. For now, just think of it as an operation that let us sort out sound waves by frequency. We will apply fast Fourier transform (FFT) on our piano audio and plot the spectrogram.
The spectrogram shows the highest peak (our most prominent signal) just slightly above 250Hz. This makes sense as the middle C has a frequency of ~261.65Hz. The next is just above 500Hz (C5 has frequency of ~523.25Hz) and so on. They are also evenly spaced out which matches my claim earlier that the overtone frequencies are multiples of the fundamental’s.
The piece of code below calculates the ratio of the magnitude between each overtone and the fundamental from the piano sample so that we can apply that to our pure sine waves.
Now we apply those ratios to our fundamental note and its overtones.
We are getting close but the sound is still too consistent throughout (the volume remains unchanged). When we press an actual piano key, it started out light before quickly getting louder, and the sound diminishes over time. One model to describe how the sound changes is ADSR (attack, decay, sustain and release). Essentially it describes sound as going through four stages: the initial stage of incline, then descending to a lower level, maintaining there for a little while before diminishing to zero. The ADSR models some instrument better than others, but should be sufficient in our case. Below is my implementation of ADSR with exponential weights for smoother and more realistic sound. Of course, you could just use simple linear weights if you wish.
The sound from the same instrument changes in different rooms hence the ADSR stages may vary. Moreover, if we know how the sound bounces around in a concert hall, in theory we can model and apply that to our clip and make it sound like we are playing the piano in the concert hall. More formally, the particular acoustic characteristics in a given location when we play a sound is called the impulse response (IRF), and if we convolve IRF and input sound wave together we can get the output sound as if we play in said location.
Putting It Together
Now we apply the overtones and ADSR weights to the pure sine wave we produced above.
Note that the characteristics of the overtones and amplitudes used here are particular to the recording I have (an old Steinway in a small room). Play around with the overtones and ADSR parameters and see what works best. The resulting audio is still quite far from an actual instrument but it’s much better than what we started with.
And of course, as a bonus, the "hello world" of music, Twinkle Twinkle Little Star written in Python.
Conclusion
I glossed over a lot of important and sometimes complicated concepts as this is intended to be a light read. I have no intention of taking the joy out of music or scaring musician away with complex formula and code as I merely try to characterize what we hear using a more systematic way of thinking (mathematics). With a few lines of code using Numpy and Scipy, one can easily analyze sound even if you don’t have trained musician ears and perfect pitch.
You can see all the code on my GitHub.
More music theory in Part 2.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS