![]() |
VOOZH | about |
If a DataFrame has multiple rows, you can randomly select a few of them instead of working with the whole dataset. For example, Suppose you have this DataFrame with rows [A, B, C, D, E]. If you randomly pick 2 rows, one possible result could be [C, E].
Here is the sample DataFrame used in this article:
Output
Employee Department Age Salary
0 Emily HR 28 50000
1 Emma IT 34 60000
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000
Let’s explore different methods to randomly select rows from a Pandas DataFrame.
The sample() method allows specifying the number of rows, a fraction of rows, whether to sample with replacement, weights and reproducibility via random_state.
Example: Below, we randomly select one row using sample().
Output
Employee Department Age Salary
2 Jake Finance 25 45000
Explanation:
The n parameter specifies the exact number of rows to select randomly.
Example: Here, we select three random rows from the DataFrame.
Output
Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000
Explanation:
The frac parameter selects a fraction of rows instead of a fixed number.
Example: In this example, we select 50% of rows randomly from the DataFrame.
Output
Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000
Explanation:
By default, sampling is without replacement. Setting replace=True allows the same row to be selected multiple times.
Example: This code select 5 rows randomly, allowing duplicates.
Output
Employee Department Age Salary
1 Emma IT 34 60000
2 Jake Finance 25 45000
0 Emily HR 28 50000
0 Emily HR 28 50000
0 Emily HR 28 50000
Explanation:
The weights parameter assigns probabilities to rows so that some rows are more likely to be selected.
Example: This program select 3 rows with weighted probabilities.
Output
Employee Department Age Salary
0 Emily HR 28 50000
2 Jake Finance 25 45000
1 Emma IT 34 60000
Explanation:
sample() can also sample columns instead of rows by setting axis=1.
Example: Here, we select 2 random columns from the DataFrame.
Output
Department Salary
0 HR 50000
1 IT 60000
2 Finance 45000
3 Marketing 70000
4 IT 52000
Explanation:
random_state ensures the same rows are selected every time the code runs.
Example: In this example, we select 2 reproducible random rows.
Output
Employee Department Age Salary
1 Emma IT 34 60000
4 Eva IT 30 52000
Explanation:
NumPy provides an alternative by selecting row indices randomly, then using loc to fetch rows.
Example: Here we select 3 random rows using NumPy.
Output
Employee Department Age Salary
4 Eva IT 30 52000
0 Emily HR 28 50000
3 David Marketing 42 70000
Explanation:
Related Article: