5 Awesome NumPy Functions That Can Save You in a Pinch
Avoid Getting Stuck with 5 Simple Functions
Overview of Your Journey
- Setting the Stage
- 1 – Quick Filtering
- 2 – Reshaping Yourself Out of Trouble
- 3 – Restructuring Your Shape
- 4 – Find Unique Values
- 5 – Combine Arrays
- Wrapping Up
Setting the Stage
When doing data science in Python, the package NumPy is omnipresent. Whether you are developing machine learning models with Scikit-Learn or plotting in Matplotlib, you’re sure to have a few NumPy arrays laying around in your code.
When I started with data science in Python, I had a poor grasp of what could be done with NumPy. Over the years, I have sharpened my NumPy skills and become a better data scientist because of it.
Being good at manipulating NumPy arrays can save your life…or at least an hour of frustrating searching. The five NumPy functions I give you here can help you when things get tough 🔥
Throughout the blog post, I assume you have installed NumPy and have already imported NumPy with the alias np :
import numpy as np
I recommend having seen NumPy previously before reading this blog. If you are completely new to NumPy, then you can check out NumPy’s Beginners Guide or this YouTube video series on NumPy.
1 – Quick Filtering
You can use the where function to quickly filter an array based on a condition. Say you have an audio signal represented as a one-dimensional array:
# Audio Signal (in Hz)
signal = np.array([23, 50, 900, 12, 1100, 10, 2746, 9, 8])
Let’s say that you want to remove everything in signal that has a Hz of less than 20. To efficiently do this in NumPy you can write:
# Filter the signal
filtered_signal = np.where(signal >= 20, signal, 0)
# Print out the result
print(filtered_signal)
>>> np.array([23, 50, 900, 0, 1100, 0, 2746, 0, 0])
The where function takes three arguments:
- The first argument (in our example
signal >= 20) gives the condition you want to use for the filtering. - The second argument (in our example
signal) specifies what you want to happen when the condition is satisfied. - The third argument (in our example
0) specifies what you want to happen when the condition is not satisfied.
As a second example, assume you have an array high-pitch indicating whether the pitch of the sounds should be raised:
# Audio Signal (in Hz)
signal = np.array([23, 50, 900, 760, 12])
# Rasing pitch
high_pitch = np.array([True, False, True, True, False])
To raise the pitch of signal whenever the corresponding high-pitch variable says so, you can simply write:
# Creating a high-pitch signal
high_pitch_signal = np.where(high_pitch, signal + 1000, signal)
# Printing out the result
print(high_pitch_signal)
>>> np.array([1023, 50, 1900, 1760, 12])
That was easy 😃
2 – Reshaping Yourself Out of Trouble
Often one has an array with the correct elements, but with the wrong form. More specifically, assume you have the following one-dimensional array:
my_array = np.array([5, 3, 17, 4, 3])
print(my_array.shape)
>>> (5,)
Here you can see that the array is one-dimensional. You want to feed my_array into another function that expects a two-dimensional input? This happens surprisingly often with libraries like Scikit-Learn! To do this, you can use the reshape function:
my_array = np.array([5, 3, 17, 4, 3]).reshape(5, 1)
print(my_array.shape)
>>> (5, 1)
Now my_array is properly two-dimensional. You can think of my_array as a matrix with five rows and a single column.
If you want to go back to my_array being one-dimensional, then you can write:
my_array = my_array.reshape(5)
print(my_array.shape)
>>> (5,)
Pro Tip: As a shorthand, you can use the NumPy function
squeezeto remove all dimensions that have length one. Hence you could have used thesqueezefunction instead of thereshapefunction above.
3 – Restructuring Your Shape
You will sometimes need to reshuffle the dimensions you already have. An example will make this clear:
Say you have represented an RGB image of size 1280×720 (this is the size of YouTube thumbnails) as a NumPy array called my_image . Your image has the shape (720, 1280, 3) . The number 3 comes from the fact that there are 3 colour channels: red, green, and blue.
How do you rearrange my_image so that the RGB channels populate the first dimension? You can do that easily with the moveaxis function:
restructured = np.moveaxis(my_image, [0, 1, 2], [2, 0, 1])
print(restrctured.shape)
>>> (3, 720, 1280)
With this simple command you have restructured the image. The two lists in moveaxis specify the source and destination positions of the axes.
Pro Tip: NumPy has other functions such as
swapaxesandtransposethat also deal with restructuring arrays. Themoveaxisfunction is the most general, and the one I use most of the time.
Why is Reshaping and Restructuring Different?
Many people think that reshaping with the reshape function and restructuring with the moveaxis function is the same. Yet, they work in different ways 😦
The best way to see this is with an example: Say that you have the matrix:
matrix = np.array([[1, 2], [3, 4], [5, 6]])
# The matrix looks like this:
1 2
3 4
5 6
If you use the moveaxis function to switch the two axes, then you get:
restructured_matrix = np.moveaxis(matrix, [0, 1], [1, 0])
# The restructured matrix looks like this:
1 3 5
2 4 6
However, if you use the reshape function, then you get:
reshaped_matrix = matrix.reshape(2, 3)
# The reshaped matrix looks like this:
1 2 3
4 5 6
The reshape function simply proceeds row-wise and makes new rows whenever appropriate.
4 – Find Unique Values
The unique function is a sweet utility function for finding the unique elements of an array. Say that you have an array representing the favourite cities of people sampled from a poll:
# Favorite cities
cities = np.array(["Paris", "London", "Vienna", "Paris", "Oslo", "London", "Paris"])
Then you can use the unique function to get the unique values in the array cities :
unique_cities = np.unique(cities)
print(unique_cities)
>>> ['London' 'Oslo' 'Paris' 'Vienna']
Notice that the unique cities are not necessarily in the order they originally appeared in (e.g. Oslo is before Paris).
With polls, it is really common to draw bar charts. In those charts, the categories are the poll options while the height of the bars represent the number of votes each option got. To get that information, you can use the optional argument return_counts as follows:
unique_cities, counts = np.unique(cities, return_counts=True)
print(unique_cities)
>>> ['London' 'Oslo' 'Paris' 'Vienna']
print(counts)
>>> [2 1 3 1]
The unique function saves you from writing a lot of annoying loops 😍
5 – Combine Arrays
Sometimes, you will be working with many arrays at the same time. Then it is often convenient to combine the arrays into a single "master" array. Doing this in NumPy is easy with the concatenate function.
Let’s say that you have two one-dimensional arrays:
array1 = np.arange(10)
array2 = np.arange(10, 20)
Then you can combine them into a longer one-dimensional array with concatenate :
# Need to put the arrays into a tuple
long_array = np.concatenate((array1, array2))
print(long_array)
>>> [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
Combining Our Tools
What if you wanted to stack array1 and array2 on top of each other? You are hence looking to create a two-dimensional vector that looks like this:
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
You can first reshape array1 and array2 into two-dimensional arrays with the reshape function:
array1 = array1.reshape(10, 1)
array2 = array2.reshape(10, 1)
Now you can use the optional axis parameter in the concatenate function to combine them correctly:
stacked_array = np.concatenate((array1, array2), axis=1)
print(stacked_array)
>>>
[[ 0 10]
[ 1 11]
[ 2 12]
[ 3 13]
[ 4 14]
[ 5 15]
[ 6 16]
[ 7 17]
[ 8 18]
[ 9 19]]
Almost there…You can now use the moveaxis function to finish the job:
stacked_array = np.moveaxis(stacked_array, [0, 1], [1, 0])
print(stacked_array)
>>>
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]]
Awesome! I hope this example showed you how some of the different tools you have just learned can come together.
Wrapping Up
You should now feel comfortable using NumPy for a few tricky situations. If you need to learn more about NumPy, then check out the NumPy documentation.
Like my writing? Check out my blog posts Type Hints, Formatting with Black, Underscores in Python, and 5 Dictionary Tips for more Python content. If you are interested in data science, programming, or anything in between, then feel free to add me on LinkedIn and say hi ✋
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS