![]() |
VOOZH | about |
In Machine Learning we train our data to predict or classify things in such a manner that isn't hardcoded in the machine. So for the first, we have the Dataset or the input data to be pre-processed and manipulated for our desired outcomes. Any ML Model to be built follows the following procedure:
Feature Scaling is a method to standardize the features present in the data in a fixed range. It has to perform during the data pre-processing. It has two main ways: Standardization and Normalization.
The steps to be followed are :
Our data can be in various formats i.e., numbers (integers) & words (strings), for now, we'll consider only the numbers in our Dataset.
Assume our dataset has random numeric values in the range of 1 to 95,000 (in random order). Just for our understanding consider a small Dataset of barely 10 values with numbers in the given range and randomized order.
1) 99 2) 789 3) 1 4) 541 5) 5 6) 6589 7) 94142 8) 7 9) 50826 10) 35464
If we just look at these values, their range is so high, that while training the model with 10,000 such values will take lot of time. That's where the problem arises.
We have a solution to solve the problem arisen i.e. Standardization. It helps us solve this by :
So, how do we do that? we'll there's a mathematical formula for the same i.e., Z-Score = (Current_value - Mean) / Standard Deviation.
Using this formula we are replacing all the input values by the Z-Score for each and every value. Hence we get values ranging from -1 to +1, keeping the range intact.
Standardization performs the following:
It's pretty obvious for Mean = 0 and S.D = 1 as all the values will have such less difference and each value will nearly be equal 0, hence Mean = 0 and S.D. = 1.
NOTE : (Just for Better Understanding)
For Mean
When we Subtract a value Smaller than the Mean we get (-ve) Output
When we Subtract a value Larger than the Mean we get (+ve) OutputHence, when we get (-ve) & (+ve) Values for Subtraction of Value with Mean, while Summation of all these values,
We get the Final Mean as 0.And when we get the Mean as 0, it means that most or nearly all values are equal to highly close to 0 and have very low variance among them.
Therefore, the S.D also becomes 1 (as good as no difference).
Here we are doing the Following:
Now to calculate the summation, run below code :
Output :
n : 10 Summation (X) : 188733 Summation (X^2) : 12751664695
Then Calculate the Standard Deviation
Output :
1275166469.5
Then calculate the Mean by using this code
Output :
18873.3
To Calculate the Z-Score for each Value of dataset_1
Comparison between the Values of Original Dataset and Scaled Down Dataset
Output :
Original DataSet | Z-Score 1 | -0.00001479987158649171 99 | -0.00001472301887561513 789 | -0.00001418191305413712 5 | -0.00001479673474114981 6859 | -0.00000942175024780167 541 | -0.00001437639746533501 94142 | 0.00005902656774649453 7 | -0.00001479516631847886 50826 | 0.00002505766953904366 35464 | 0.00001301061500347112
Now we will compare and see the graph of the Original Values and the Standardized Values.
Output :
Comparison between the Values of Original Dataset and Scaled Down Dataset.
Output :
Original DataSet | Z-Score 1 | -0.00001479987158649171 99 | -0.00001472301887561513 789 | -0.00001418191305413712 5 | -0.00001479673474114981 6859 | -0.00000942175024780167 541 | -0.00001437639746533501 94142 | 0.00005902656774649453 7 | -0.00001479516631847886 50826 | 0.00002505766953904366 35464 | 0.00001301061500347112
Comparing the Range of Values using Graphs
1) Graph of the Original Values
Output :
2) Graph of the Standardized Values
Output :
Hence we have Reviewed, Understood the Concept, and Implemented as well the Concept of Standardization in Machine Learning