![]() |
VOOZH | about |
Pandas is a powerful open-source data analysis and manipulation library for Python. The library is particularly well-suited for handling labeled data such as tables with rows and columns. Pandas allows to create various graphs directly from your data using built-in functions.
👁 Data-Visualization-with-PandasThis tutorial covers Pandas capabilities for visualizing data with line plots, area charts, bar plots, and more.
Pandas offers several features that make it a great choice for data visualization:
To get started you need to install Pandas using pip:
pip install pandas
Once Pandas is installed, import the required libraries and load your data Sample CSV files df1 and df2 used in this tutorial can be downloaded from here.
Explanation:
Pandas provides several built-in plotting functions to create various types of charts mainly focused on statistical data. These plots help visualize trends, distributions, and relationships within the data. Let's go through them one by one:
A Line plot is a graph that shows the frequency of data along a number line. It is best to use a line plot when the data is time series. It can be created using Dataframe.plot() function.
Output:
👁 ImageExplanation: plot() method by default creates a line plot for all numeric columns in the DataFrame, using the index for the x-axis.
Area plot shows data with a line and fills the space below the line with color. It helps see how things change over time. we can plot it using DataFrame.plot.area() function.
Output:
👁 ImageExplanation: plot.area() creates an area chart by filling space under lines for each numeric column. alpha=0.4 sets transparency to make overlaps clearer.
A bar chart presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally with DataFrame.plot.bar() function.
Output:
👁 ImageExplanation: plot.bar() creates a vertical bar chart showing values for each category or index.
Histograms help visualize the distribution of data by grouping values into bins. Pandas use DataFrame.plot.hist() function to plot histogram.
Output:
👁 ImageExplanation: plot.hist() creates a histogram by grouping a column's values into intervals. bins=50 sets the number of bins to show the data distribution more clearly.
Scatter plots are used when you want to show the relationship between two variables. They are also called correlation and can be created using DataFrame.plot.scatter() function.
Output:
👁 ImageExplanation: plot.scatter() creates a scatter plot to show the relationship between two numeric columns. x and y specify the columns for the x-axis and y-axis.
A box plot displays the distribution of data, showing the median, quartiles, and outliers. we can use DataFrame.plot.box() function or DataFrame.boxplot() to create it.
Output:
👁 ImageExplanation: plot.box() generates a box-and-whisker plot, visualizing median, quartiles and outliers.
Hexagonal binning helps manage dense datasets by using hexagons instead of individual points. It’s useful for visualizing large datasets where points may overlap. Let's create the hexagonal bin plot.
Output:
👁 ImageExplanation: plot.hexbin() creates a hexagonal bin plot for dense scatter data. x and y set the axes, gridsize controls hexagon count and cmap defines the color based on density.
KDE (Kernel Density Estimation) creates a smooth curve to show the shape of data by using the df.plot.kde() function. It's useful for visualizing data patterns and simulating new data based on real examples.
Output:
👁 ImageExplanation: plot.kde() creates a Kernel Density Estimation plot, showing a smooth probability density curve.
Pandas allows you to customize your plots in many ways. You can change things like colors, titles, labels, and more. Here are some common customizations.
You can customize the plot by adding a title and labels for the x and y axes. You can also enable gridlines to make the plot easier to read:
Output:
👁 ImageExplanation: This code customizes a line plot with a title, labels for x ('Index') and y ('Values') axes, and grid=True adds gridlines for easier data reading.
If you want to differentiate between the two lines visually you can change the line style (e.g., solid line, dashed line) with the help of pandas.
Output:
👁 ImageExplanation: style parameter sets line styles (e.g., '-', '--', '-.', ':') to visually distinguish multiple columns. title, xlabel, ylabel and grid further customize the plot.
Change the size of the plot to better fit the presentation or analysis context You can change it by using the figsize parameter:
Output:
👁 ImageExplanation: figsize=(12, 6) sets the plot size in inches, useful for presentations or detailed views. Other parameters improve labeling and readability.
A stacked bar plot can be created by setting stacked=True. It helps you visualize the cumulative value for each index.
Output:
👁 ImageExplanation: plot.bar() creates a bar chart and stacked=True stacks column values vertically for each index. figsize, title,xlabel, ylabel and grid customize the appearance.