VOOZH about

URL: https://www.geeksforgeeks.org/python/how-to-randomly-select-rows-from-pandas-dataframe/

⇱ Randomly Select Rows from Pandas DataFrame - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Randomly Select Rows from Pandas DataFrame

Last Updated : 3 Oct, 2025

If a DataFrame has multiple rows, you can randomly select a few of them instead of working with the whole dataset. For example, Suppose you have this DataFrame with rows [A, B, C, D, E]. If you randomly pick 2 rows, one possible result could be [C, E].

Here is the sample DataFrame used in this article:

Output

Employee Department Age Salary
0 Emily HR 28 50000
1 Emma IT 34 60000
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Let’s explore different methods to randomly select rows from a Pandas DataFrame.

Using sample()

The sample() method allows specifying the number of rows, a fraction of rows, whether to sample with replacement, weights and reproducibility via random_state.

Example: Below, we randomly select one row using sample().

Output

Employee Department Age Salary
2 Jake Finance 25 45000

Explanation:

  • df.sample() selects one random row by default.
  • Returns a DataFrame with the sampled row.
  • Each execution may return a different row unless random_state is set.

Using n parameter

The n parameter specifies the exact number of rows to select randomly.

Example: Here, we select three random rows from the DataFrame.

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000
4 Eva IT 30 52000

Explanation:

  • n=3 instructs Pandas to return 3 rows.
  • Rows are selected randomly without replacement by default.

Using frac Parameter

The frac parameter selects a fraction of rows instead of a fixed number.

Example: In this example, we select 50% of rows randomly from the DataFrame.

Output

Employee Department Age Salary
2 Jake Finance 25 45000
3 David Marketing 42 70000

Explanation:

  • frac=0.5 selects half of the DataFrame rows randomly.
  • Useful when you want a proportional random sample instead of a fixed number.

Using replace=True

By default, sampling is without replacement. Setting replace=True allows the same row to be selected multiple times.

Example: This code select 5 rows randomly, allowing duplicates.

Output

Employee Department Age Salary
1 Emma IT 34 60000
2 Jake Finance 25 45000
0 Emily HR 28 50000
0 Emily HR 28 50000
0 Emily HR 28 50000

Explanation:

  • replace=True allows the same row to appear multiple times.
  • Useful for bootstrapping or resampling methods.

Using weights

The weights parameter assigns probabilities to rows so that some rows are more likely to be selected.

Example: This program select 3 rows with weighted probabilities.

Output

Employee Department Age Salary
0 Emily HR 28 50000
2 Jake Finance 25 45000
1 Emma IT 34 60000

Explanation:

  • weights is a list of probabilities for each row.
  • Rows with higher weights have a higher chance of being selected.

Using axis Parameter

sample() can also sample columns instead of rows by setting axis=1.

Example: Here, we select 2 random columns from the DataFrame.

Output

Department Salary
0 HR 50000
1 IT 60000
2 Finance 45000
3 Marketing 70000
4 IT 52000

Explanation:

  • axis=1 changes the sampling from rows to columns.
  • n=2 selects two columns randomly.

Using random_state for Reproducibility

random_state ensures the same rows are selected every time the code runs.

Example: In this example, we select 2 reproducible random rows.

Output

Employee Department Age Salary
1 Emma IT 34 60000
4 Eva IT 30 52000

Explanation:

  • random_state seeds the random number generator.
  • Ensures the same random selection on each run.

Using NumPy

NumPy provides an alternative by selecting row indices randomly, then using loc to fetch rows.

Example: Here we select 3 random rows using NumPy.

Output

Employee Department Age Salary
4 Eva IT 30 52000
0 Emily HR 28 50000
3 David Marketing 42 70000

Explanation:

  • np.random.choice randomly selects row indices.
  • replace=False ensures no duplicates.
  • df.loc[indices] fetches the corresponding rows.

Related Article:

Comment