![]() |
VOOZH | about |
In this article, we will explain univariate linear regression. It is one of the simplest types of regression. In this regression, we predict our target value on only one independent variable.
Univariate Linear Regression is a type of regression in which the target variable depends on only one independent variable. For univariate regression, we use univariate data. For instance, a dataset of points on a line can be considered as univariate data where abscissa can be considered as an input feature and ordinate can be considered as output/target.
For line Y = 2X + 3; the Input feature will be X and Y will be the target.
| X | Y |
|---|---|
| 1 | 5 |
| 2 | 7 |
| 3 | 9 |
| 4 | 11 |
| 5 | 13 |
Concept: For univariate linear regression, there is only one input feature vector. The line of regression will be in the form of the following:
Y = b0 + b1 * X Where, b0 and b1 are the coefficients of regression.here we try to find the best b0 and b1 by training a model so that our predicted variable y has minimum difference with actual y.
A univariate linear regression model constitutes of several utility functions. We will define each function one by one and at the end, we will combine them in a class to form a working univariate linear regression model object.
In this function, we predict the value of y on a given value of x by multiplying and adding the coefficient of regression to the x.
The cost function computes the error with the current value of regression coefficients. It quantitatively defines how far the model predicted value is from the actual value wrt regression coefficients which have the lowest rate of error.
Mean-Squared Error(MSE) = sum of squares of difference between predicted and actual valueWe use square so that positive and negative error does not cancel out each other.
Here:
We will use gradient descent for updating our regression coefficient. It is an optimization algorithm that we use to train our model. In gradient descent, we take the partial derivative of the cost function wrt to our regression coefficient and multiply with the learning rate alpha and subtract it from our coefficient to adjust our regression coefficient.
For the case of simplicity we will apply gradient descent on only one row element and
try to estimate our univaraite linear regression coeficient on the basis of
this gradient descent.
Since our cost function has two parameters b_1 and b_0 we have taken the derivative of the cost function wrt b_1 and then wrt b_0.
Python function for Gradient Descent.
At each iteration (epoch), the values of the regression coefficient are updated by a specific value wrt to the error from the previous iteration. This updation is very crucial and is the crux of the machine learning applications that you write. Updating the coefficients is done by penalizing their value with a fraction of the error that its previous values caused. This fraction is called the learning rate. This defines how fast our model reaches to point of convergence(the point where the error is ideally 0).
This is the function that is used to specify when the iterations should stop. As per the user, the algorithm stop_iteration generally returns true in the following conditions:
Having all the utility functions defined let's see the pseudo-code followed by its implementation:
Pseudocode for linear regression:
x, y is the given data.
(b0, b1) <-- (0, 0)
i = 0
while True:
if stop_iteration(i):
break
else:
b0 = update_coeff(x, y, b0, b1, 0, alpha)
b1 = update_coeff(x, y, b0, b1, 1, alpha)
Output:
5.4596467843708425e+166