![]() |
VOOZH | about |
In this article, we will discuss how to randomly select columns from the Pandas Dataframe.
According to our requirement, we can randomly select columns from a pandas Database method where pandas df.sample() method helps us randomly select rows and columns.
Syntax of pandas sample() method:
Return a random selection of elements from an object's axis. For repeatability, you may use the random_state parameter.
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
Parameters:
- n: int value, Number of random rows to generate.
- frac: Float value, Returns (float value * length of data frame values ). frac cannot be used with n.
- replace: Boolean value, return sample with replacement if True.
- random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
- axis: 0 or βrowβ for Rows and 1 or βcolumnβ for Columns.
In this approach firstly the Pandas package is read with which the given CSV file is imported using pd.read_csv() method is used to read the dataset. df.sample() method is used to randomly select rows and columns. axis =' columns' says that we're selecting columns. when "n" isn't specified the method returns one random column by default.
To download the CSV file click here
Output:
In this approach, If the user wants to select a certain number of columns more than 1 we use the parameter 'n' for this purpose. In the below example, we give n as 5. randomly selecting 5 columns from the database.
Output:
Here, in this approach, If the user wants to select a column more than once, or if repeatability is needed in our selection we should set the replace parameter to 'True' in the df.sample() method. Column 'Bunkerfields' is repeated twice.
Output:
Here in this approach, if the user wants to select a portion of the dataset, the frac parameter should be used. In the below example our dataset has 10 columns. 0.25 of 10 is 2.5, it is further rounded to 2. A year and GasFlaring columns are returned.
Output: