![]() |
VOOZH | about |
The describe() method in Pandas generates descriptive statistics of DataFrame columns which provides key metrics like mean, standard deviation, percentiles and more. It works with numeric data by default but can also handle categorical data which offers insights like the most frequent value and the number of unique entries. In this article, we'll see how to use describe() for both numeric and categorical data.
DataFrame.describe(percentiles=None, include=None, exclude=None)
Parameters:
The describe() method returns a statistical summary of the DataFrame or Series which helps to understand the key characteristics of our data quickly. Lets see some examples for its better understanding.
Here we will be using NBA dataset which you can download it from here.
Here we will see how the describe() method generates a statistical summary for numeric columns such as age and salary. This is a basic use case of describe() to give us an overview of key statistical metrics across the dataset.
Output:
Descriptive Statistics for Numerical Columns generated using .describe() Method
This summary provides us a quick overview of the numeric columns in the dataset which helps us understand the distribution of key variables like age and salary.
We can customize the describe() method by specifying custom percentiles. By passing a list of percentiles we can obtain more detailed insights into our dataβs distribution beyond the default 25th, 50th and 75th percentiles.
Output:
This customization is helpful when we need more insights into the distribution of our data such as understanding how values fall within certain ranges or percentiles.
The describe() method also works with string data i.e object data type. When used on string data, it provides different statistics such as the count of unique values, most frequent values etc. This example shows how to apply describe() to a column containing categorical (string) data.
Output:
For string (object) data, describe() provides:
This is useful for quickly understanding the distribution of categorical data or identifying the most frequent values.
We may sometimes want to generate a summary for a specific column in our DataFrame. For example we may be interested in analyzing just the "Salary" column without summarizing the other columns.
Output:
By using the include='all' parameter we can generate a summary for all columns in the DataFrame regardless of data type. This is helpful when we want to analyze both numeric and categorical data at the same time.
Output:
This is helpful for getting a complete overview of all data in the DataFrame in a single summary.