VOOZH about

URL: https://dzone.com/articles/data-analytics-using-python

⇱ Data Analytics Using Python


Related

  1. DZone
  2. Data Engineering
  3. Data
  4. Data Analytics Using Python

Data Analytics Using Python

Python offers a wide range of libraries like NumPy, Pandas, and Matplotlib to enable data ingestion, exploration, manipulation, and visualization.

Likes
Comment
Save
1.7K Views

Join the DZone community and get the full member experience.

Join For Free

Business Intelligence and Analytics entails data collection, exploration, transformation, and analysis to help businesses make data-driven decisions. BI leverages technologies, applications, and best practices to analyze transactional data and derive business insights to facilitate transformational outcomes. 

Python has emerged as a powerful tool within the field of data analytics led by the extensive libraries for data ingestion, analysis, manipulation, and visualization. It offers vast data integration capabilities enabling data ingestion from heterogeneous data sources.

Key Data Analytics Libraries

  • statsmodels: Focuses on building estimations, statistical modeling for machine learning, and using an extensive set of classes and functions
  • NumPy and SciPy: Both of these libraries provide multi-dimensional array objects for mathematical and scientific calculations.
  • Pandas: Pandas is widely used by data scientists for data analysis and manipulation of structured data. 
  • Matplotlib: Matplotlib offers a strong suite of visualization capabilities to create interactive data visualization. 

In this article, I will focus on using Pandas, NumPy, and Matplotlib to analyze a sales inquiry coming into a demand center. I will use a CSV file as an input for this analysis.

Data Exploration

In this section, we will import the CSV file containing the input data and explore the structure of the file.

Python
import numpy as nmp
import pandas as pds
import matplotlib.pyplot as pplt
demand = pds.read_csv('DemandGen.csv')
demand.head()


The info() function can be used to describe the input data frame. This will list the column name, number of values, and data types for all the columns in the DataFrame.

Python
demand.info()


The following code block will print the range of lead dates as well as number of marketing-qualified and sales-qualified leads.

Python
print('lead_date ranges from', demand.lead_date.min(), 'to', demand.lead_date.max())

print('number of marketing qualified leads',demand[demand['MQL'] == True]['lead_id'].nunique())

print('number of sales qualified leads',demand[demand['SQL'] == True]['lead_id'].nunique())


Data Transformation and Visualization

In this section, let’s explore the Python capabilities surrounding data transformation, creating calculated columns, changing the data types, and data visualization.

Number of Leads by Lead Date

The code below will create a line chart to track the number of leads the demand center has received by the dates.

Python
#convert the lead date into datetime
demand.lead_date = pds.to_datetime(demand['lead_date'])
demand['DoW'] = demand['lead_date'].dt.dayofweek
daily_leads = demand.groupby(['lead_date'])['lead_id'].nunique()
# plot daily leads
daily_leads.plot()
pplt.title('Daily Leads')
pplt.xlabel('Lead Date')
pplt.ylabel('Number of Leads')
pplt.xticks(rotation=45)
pplt.show()


Conversion Rates by Lead Channel

The following code can be used to calculate the conversion rate from a lead to a Marketing Qualified Lead. This code block will create a bar chart to visualize the conversion rate by the channel.

Python
total = demand.groupby(['lead_channel'])['lead_id'].nunique()
MarketingQualifiedLead = demand[demand['MQL'] == True].groupby(['lead_channel'])['lead_id'].nunique()

# Calculate conversion rate
channel_conversion_rate = MarketingQualifiedLead/total
print(round(channel_conversion_rate*100,2), '%')

channel_conversion_rate.plot(kind='bar')

pplt.title('Conversion rate by Channel\n', size = 16)
pplt.xlabel('Channel', size = 14)
pplt.ylabel('Conversion rate (%)', size = 14)
pplt.show()


Number of Leads by the Customer Segment for Each Lead Channel

The following code will create a bar chart to visualize the number of leads received by the demand channel for various segments.

Python
channel_segment = demand.groupby(['lead_channel', 'customer_segment'])['lead_id'].count()

channel_segment_df = pds.DataFrame(channel_segment.unstack(level = 1))

channel_segment_df.plot(kind = 'bar')

pplt.title('Marketing demand channels by customer segment')
pplt.xlabel('Customer Segment')
pplt.ylabel('Leads')
pplt.legend(loc = 'upper right', labels = channel_segment_df.columns.values)
pplt.show()


User Defined Functions

Automating the Conversion Calculation

Python allows you to create a user-defined function to execute the set of statements based on the input parameters. The code below creates a function to calculate the MQL conversion based on input parameters.

Python
# MQL Conversion rate
def mql_conversion_rate(df, column_names):
 column_mql_conv = df[df['SQL'] == True] \
 .groupby(column_names)['lead_id'].nunique()

 # Total number leads
 column_total = df.groupby(column_names)['lead_id'].nunique() 
 
 # MQL Conversion rate 
 conversion_rate = column_ mql_conv/column_total
 
 # Fill missing values with 0
 conversion_rate = conversion_rate.fillna(0)
 return conversion_rate

# Calculate conversion rate by lead date and customer segment
customer_segment_conv = conversion_rate(demand, ['lead_date',
 'customer_segment'])

customer_segment_df = pds.df(customer_segment_conv.unstack(level = 1))

# Visualize conversion by age_group
customer_segment_df.plot()
pplt.title('Conversion rate by customer segment\n', size = 16)
pplt.ylabel('Conversion rate', size = 14)
pplt.xlabel('Customer Segment', size = 14)
pplt.show()


Visualization Function

The following code creates a function to plot a line chart based on the input data frame.

Python
# Plotting function
def plotting_conv(df):
 for column in df:
 pplt.plot(df.index, df[column])
 pplt.title('Daily ' + str(column) + ' conversion rate\n', 
 size = 16)
 pplt.ylabel('MQL Conversion rate', size = 14)
 pplt.xlabel('Date', size = 14)
 pplt.show()
 pplt.clf()


Below, the code invokes the plotting function to create individual line charts for each customer segment.

Python
%matplotlib inline
# Calculate conversion rate by lead date and customer segment
customer_segment_conv = conversion_rate(demand, 
 ['lead_date', 'customer_segment'])

# Unstack customer_segment_conv and create a DataFrame
customer_segment_df = pds.DataFrame(customer_segment_conv.unstack(level=1))

# Plot the results
plotting_conv(customer_segment_df)



Conclusion

In conclusion, Python offers a robust data analytics framework using libraries like NumPy, Pandas, and Matplotlib. These libraries are widely used by the Business Intelligence and Data Analytics community for data exploration, manipulation, statistical analysis, and data visualization. As the analytics landscape continues to evolve, I would recommend continued learning of SciPy, Statsmodels, and Scikit-learn to implement AI/ML applications.

Analytics Data visualization Data (computing) Pandas Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • BigQuery DataFrames in Python
  • Power BI in 2024: Navigating the Evolving Landscape for Developers
  • How to Use Python for Data Science

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: