![]() |
VOOZH | about |
NumPy is an open-source Python library used for numerical computing and handling large multi-dimensional arrays efficiently. In interviews, questions on NumPy are often asked to evaluate your understanding of array operations, mathematical functions and performance optimization. Below are some of the most frequently asked interview questions covering key NumPy topics.
NumPy is used for numerical and scientific computing. It offers support for arrays, matrices and a variety of mathematical operations that can effectively operate on these arrays.
We can create NumPy arrays using various methods. Here are some common ways to create NumPy arrays:
Here are some main features of the NumPy:
Calculating the dot product of two NumPy arrays we used numpy.dot() function and we also used the @ operator:
1. Using numpy.dot() function:
a: The first input array (NumPy array).
b: The second input array (NumPy array).
2. Using the @ operator
Both methods will return the dot product of the two arrays as a scalar value.
In numPy we have two ways to copy an array. shallow copy and deep copy are two most used methods used in numpy to copy an array. Here is the main difference between both of them.
| Feature | Shallow Copy | Deep Copy |
|---|---|---|
| Definition | A new array that is a view of the original array's data. | A completely new and independent array with its own copy of the data. |
| Memory | References the same memory location as the original array. | Allocates new memory, duplicating the data. |
| Duplication | No actual duplication of data; only references. | Full duplication of data is created. |
| Effect of Changes | Changes in the original array reflect in the shallow copy and vice versa. | Changes in the original array do not affect the deep copy and vice versa. |
We can reshape a NumPy array by using the reshape() method or the np.reshape() function. it help us to change the dimensions of the array and keep all the elements constant.
1. Using the reshape() method:
2. Using the np.reshape() function:
In both cases, original_array is the existing NumPy array you want to reshape and new_shape is a tuple specifying the desired shape of the new array.
To perform element-wise operations on NumPy arrays, you can use standard arithmetic operators. NumPy automatically applies these operations element-wise when you use them with arrays of the same shape.
Output:
Addition: [ 7 9 11 13 15]
Subtraction: [-5 -5 -5 -5 -5]
Multiplication: [ 6 14 24 36 50]
Division: [0.16666667 0.28571429 0.375 0.44444444 0.5 ]
Power: [ 1 4 9 16 25]
NumPy provides a wide range of functions for generating random numbers. You can generate random numbers from various probability distributions, set seeds for reproducibility and more. Here are some common ways to generate random numbers with NumPy:
1. Using np.random.rand()
Generating a Random Float between 0 and 1 using np.random.rand()
2. Using np.random.randint()
Generating a Random Integer within a Range using np.random.randint().
3. Using np.random.randn()
4. Using np.random.seed()
We can set a seed using np.random.seed() to ensure that the generated random numbers are reproducible.
We can create a NumPy array from a Python list using the np.array() constructor provided by NumPy.
We can access elements in a NumPy array based on specific conditions using boolean indexing. Boolean indexing allows us to create true and false values based on a condition.
Output:
Selected Elements (greater than 3): [4 5]
In NumPy there are so many data types that are used to specify the type of data which stored in array. This data type provide control that how data stored in memory during operations. Some common data types supported by NumPy include:
We can concatenate two NumPy arrays vertically (along the rows) using the np.vstack() function or the np.concatenate() function with the axis parameter set to 0. Here's how to do it with both methods:
1. Using np.vstack()
2. Using np.concatenate() with axis
Matrix inversion in NumPy refers to the process of finding the inverse of a square matrix. The identity matrix is produced when multiplying the original matrix by the inverse of the matrix. In other words, if A is a square matrix and A^(-1) is its inverse, then A * A^(-1) = I, where I is the identity matrix.
NumPy provides a convenient function called numpy.linalg.inv() to compute the inverse of a square matrix. Here's how you can use it:
Output:
Original Matrix:
[[ 1 2 3]
[ 0 1 4]
[ 5 6 0]]Inverse Matrix:
[[-24. 18. 5.]
[ 20. -15. -4.]
[ -5. 4. 1.]]
In NumPy, the var function is used to compute the variance of elements in an array or along a specified axis. Variance is a measure of the spread or dispersion of data points.
The arithmetic mean (average) in NumPy can be calculated using numpy.mean(). This method tallies elements in an array, whether it be along a specified axis or the whole array, if no axis is explicitly mentioned. The summation of all elements is then divided by the overall number of elements which provides the average.
You can convert a multidimensional array to a 1D array which is also known as flattening the array in NumPy using various methods. Two common methods are using for the Convert a multidimensional array to 1D array.
1. Using flatten():
Output:
one dimensional array [1 2 3 4 5 6 7 8 9]
2. Using ravel():
Output:
one dimensional array [1 2 3 4 5 6 7 8 9]
Both of these methods will flatten the multidimensional array into a 1D array. The primary difference between them:
Identifying and removing outliers in a NumPy array involves several steps. Outliers are data points that significantly deviate from the majority of the data and can adversely affect the results of data analysis. Here's a general approach to identify and remove outliers:
Identifying Outliers:
1. Calculate Descriptive Statistics: Compute basic statistics like the mean and standard deviation of the array to understand the central tendency and spread of the data.
Output:
Outliers: [300]
2. Using IQR: IQR (Interquartile Range) is the difference between the 75th percentile (Q3) and the 25th percentile (Q1), representing the spread of the middle 50% of the data.
Output:
Outliers: [ 10 300]
We can remove null values using numpy.isnan() method.
Output:
Original Array: [ 1. 2. nan 4. nan 6.]
Filtered Array (without NaNs): [1. 2. 4. 6.]
We can filter out missing or null data using a masked array or a boolean mask.
Output:
[1. 2. 4. 5.]
In NumPy, both slicing and indexing are fundamental operations for accessing and manipulating elements in arrays, but there are some main difference are avialable.
| Feature | Slicing | Indexing |
|---|---|---|
| Definition | Extracts a range/subset of elements from an array. | Accesses specific elements or subsets from an array. |
| Syntax | Uses a colon (:) inside square brackets (e.g., arr[1:5]). | Uses square brackets with index values (e.g., arr[2], arr[1, 3]). |
| Output | Produces a contiguous block of elements. | Produces a single element or a set of specific elements. |
| Use Case | When you want a continuous slice of data. | When you want random or specific positions. |
| Example | arr[2:6] β elements from index 2 to 5. | arr[0] β first element, arr[[1,3,5]] β elements at indices 1, 3, and 5. |
We can create a NumPy array with the same values using various functions and methods depending on your specific needs. Here are a few common approaches:
1. Using numpy.full():
You can use the numpy.full() function to create an array filled with a specific value. This function takes two arguments: the shape of the array and the fill value.
2. Using Broadcasting:
If you want to create an array of the same value repeated multiple times, you can use broadcasting with NumPy.
3. Using list comprehension:
You can also create an array with the same values using a list comprehension and then converting it to a NumPy array.
A masked array in NumPy is a special type of array that includes an additional Boolean mask, which marks certain elements as invalid or masked. This allows you to work with data that has missing or invalid values without having to modify the original data. Masked arrays are particularly useful when dealing with real-world datasets that may have missing or unreliable data points.
Example: Creating and Using a Masked Array
Output:
Original Data: [ 1 2 -999 4 5]
Masked Data: [1 2 -- 4 5]
Mean (ignoring masked values): 3.0
Broadcasting in NumPy is the ability of NumPy to perform arithmetic operations on arrays of different shapes and sizes without explicitly replicating the data.
1. Broadcasting Scalar
Output:
[11 12 13 14 15]
2. Arrays with Different Shapes
Output:
[[11 22 33]
[14 25 36]]
To arrange a NumPy array in both ascending and descending order we use numpy.sort() to create an ascending one and numpy.argsort() for a descending one. Hereβs how to do it:
1. Ascending Order: You can use the numpy.sort() function to sort your array in ascending order. The function will return a new sorted array, while still leaving the original array unchanged.
Output:
Ascending: [1 2 3 4 5]
2. Sorting in Descending Order: To sort a NumPy array in descending order, you can use the numpy.argsort() function to obtain the indices that would sort the array in ascending order and then reverse those indices to sort in descending order.
Output:
Descending: [ 5. 4. 3. 2. 1. nan]
NumPy arrays offer several advantages over Python lists when it comes to numerical and scientific computing. Here are some key reasons why NumPy arrays are often preferred:
| Feature | reshape() | resize() |
|---|---|---|
| Definition | Returns a new view or copy of the array with a new shape. | Modifies the array itself (in-place) to match the new shape. |
| Original Array | Does not change the original array unless inplace modification is forced. | Changes the original array directly. |
| Return Value | Returns the reshaped array (new object). | Returns None (operation done in-place). |
| Data Handling | Requires that the total number of elements match the new shape. | If new size is bigger β fills with zeros. If smaller β array is trimmed. |
| Memory | Often returns a view (shares data) if possible, else a copy. | Creates/reallocates memory if needed. |
| Use Case | When you want a reshaped version of an array without altering the original. | When you want to permanently change the shape of the array, even if padding or truncating is needed. |
These functions are used for combining arrays in different dimensions and are widely used in various data processing and manipulation tasks.
| Feature | vstack() | hstack() |
|---|---|---|
| Definition | Stacks arrays vertically (row-wise). | Stacks arrays horizontally (column-wise). |
| Axis | Operates along axis=0. | Operates along axis=1. |
| Requirement | Arrays must have the same number of columns. | Arrays must have the same number of rows. |
| Output Shape | Increases the number of rows. | Increases the number of columns. |
| Example | np.vstack(([1,2,3], [4,5,6])) β [[1,2,3],[4,5,6]] | np.hstack(([1,2,3], [4,5,6])) β [1,2,3,4,5,6] |
| Use Case | Useful when combining data points with same features. | Useful when combining features/variables for same data points. |
With the help of np.eigvals() method, we can get the eigen values of a matrix by using np.eigvals() method.
The Determinant of a square matrix is a unique number that can be derived from a square matrix. Using the numpy.linalg.det() method, NumPy gives us the ability to determine the determinant of a square matrix.
Method 1: Using == operator
We generally use the == operator to compare two NumPy arrays to generate a new array object. Call ndarray.all() with the new array object as ndarray to return True if the two NumPy arrays are equivalent.
Output:
True
False
Method 2: Using array_equal()
This array_equal() function checks if two arrays have the same elements and same shape.
A matrix's decomposition into the form "A=QR," where Q is an orthogonal matrix and R is an upper-triangular matrix and it is known as QR factorization. We can determine the QR decomposition of a given using matrix.linalg.qr().
An ndarray also known as "N-dimensional array" is a fundamental data structure used in NumPy for effectively storing and manipulating data, particularly numerical data. It is:
Vectorization in NumPy means performing operations on entire arrays or vectors at once without using explicit loops. NumPy internally uses optimized C code, so vectorized operations are much faster than iterating through elements in Python.
Output:
Using loop: [1, 4, 9, 16, 25]
Using vectorization: [ 1 4 9 16 25]
np.copy(), view() and = assignment?| Feature | = Assignment | .view() | .copy() |
|---|---|---|---|
| Definition | Just creates a new reference to the same array object. | Creates a shallow copy (new object but shares same data buffer). | Creates a deep copy (new object with its own data). |
| Memory | Same memory location (no duplication). | Different object, but data points to the same memory. | Completely independent memory allocation. |
| Object ID | Both variables have the same object ID. | Different object IDs, but share underlying data. | Different object IDs and separate memory. |
| Effect of Changes | Changes in one array reflect in the other. | Changes in data reflect in both arrays, but attributes like shape are independent. | No effect as arrays are independent. |
| Speed | Fastest (no copy at all). | Faster than deep copy (just metadata copy). | Slower (data duplication happens). |
| Argument | b = a | b = a.view() | b = a.copy() |
| Feature | shape | size |
|---|---|---|
| Definition | Returns a tuple representing the dimensions of the array. | Returns the total number of elements in the array. |
| Type of Output | Tuple for 2D arrays. | Integer (single value). |
| Information Provided | Gives details about array dimensions, like number of rows, columns, etc. | Gives overall element count, ignoring shape. |
| Calculation | Tuple shows the length along each axis. | Product of all dimensions in the shape tuple. |
| Argument | arr.shape β (3, 4) | arr.size β 12 |
| Use Case | Useful to know the structure of the array. | Useful to know the total elements for operations like reshaping or flattening. |
| Feature | Python Sequences (list, tuple) | NumPy Array (ndarray) | Pandas Series/Array |
|---|---|---|---|
| Data Type | Can hold mixed data types like [1, "a", 3.5]. | Holds homogeneous data (all elements of same dtype). | Mostly homogeneous (like NumPy), but can also hold mixed or object dtype. |
| Dimensionality | Mostly 1D (lists/tuples). Nested lists can simulate higher dimensions but inefficient. | Supports n-dimensional arrays. | Mainly 1D (Series) or 2D (DataFrame); built on top of NumPy. |
| Performance | Slower, not memory-efficient (pure Python objects). | Fast, memory-efficient (C-based implementation). | Slightly slower than NumPy due to extra features but optimized for labeled data. |
| Indexing | Zero-based indexing and supports slicing. | Supports advanced indexing, slicing, boolean masking, broadcasting. | Supports indexing + labels, powerful alignment, missing value handling. |
| Operations | Element-wise operations require loops or comprehensions. | Vectorized element-wise operations, linear algebra, broadcasting. | Vectorized operations (inherited from NumPy) + axis-aware operations. |
| Use Case | General-purpose container. | Numerical computation, scientific computing, machine learning. | Data analysis, tabular data handling, missing values, statistics. |
| Arguments | lst = [1, 2, 3] | np.array([1, 2, 3]) | pd.Series([1, 2, 3]) |
You can use the DataFrame's.values attribute to convert a Pandas DataFrame into a NumPy array.
Output:
[[1 4]
[2 5]
[3 6]]
We can reverse a NumPy array using the [::-1] slicing technique.
Output:
[5 4 3 2 1]
NumPy arrays are much faster than Python lists because of the way they are implemented:
Output:
Python List Time: 0.05335259437561035
NumPy Array Time: 0.004484653472900391
The bincount() function can be used to count the instances of a given value. It should be noted that the bincount() function takes boolean expressions or positive integers as arguments. Integers that are negative cannot be used.
Using the max and min functions, we can determine array's maximum or minimum value in NumPy. These operations accept an array as an input and output the array's maximum or minimum value.
Output:
max value: 3
min value: 1
Both indexing and slicing are useful methods for cleaning data because they let you modify or filter data based on particular criteria or target particular data points for modification. In this example, negative values are located and replaced with zeros using indexing and a new array with more than two members is created using slicing.
Output:
After replacing negatives with zeros: [1 2 0 4 5 0 7]
Subset with elements greater than 2: [4 5 7]
Apply the unique function from the NumPy module to identify the unique elements in an array in NumPy. This function returns the array's unique elements in sorted order.
Output:
[1 2 3 4 5 6 7]