![]() |
VOOZH | about |
Polynomial Regression is a form of linear regression where the relationship between the independent variable (x) and the dependent variable (y) is modelled as an degree polynomial. It is useful when the data exhibits a non-linear relationship allowing the model to fit a curve to the data.
Polynomial regression is an extension of linear regression where higher-degree terms are added to model non-linear relationships. The general form of the equation for a polynomial regression of degree is:
where:
The goal of regression analysis is to model the expected value of a dependent variable in terms of an independent variable . In simple linear regression, this relationship is modeled as:
Here
However in cases where the relationship between the variables is nonlinear such as modelling chemical synthesis based on temperature, a linear model may not be sufficient. Instead, we use polynomial regression which introduces higher-degree terms such as to better capture the relationship.
For example, a quadratic model can be written as:
Here:
In general, polynomial regression can be extended to the nth degree:
While the regression function is linear in terms of the unknown coefficients , the model itself captures non-linear patterns in the data. The coefficients are estimated using techniques like Least Square technique to minimize the error between predicted and actual values.
Choosing the right polynomial degree is important: a higher degree may fit the data more closely but it can lead to overfitting. The degree should be selected based on the complexity of the data. Once the model is trained, it can be used to make predictions on new data, capturing non-linear relationships and providing a more accurate model for real-world applications.
Letβs consider an example in the field of finance where we analyze the relationship between an employee's years of experience and their corresponding salary. If we check that the relationship might not be linear, polynomial regression can be used to model it more accurately.
Example Data:
Years of Experience | Salary (in dollars) |
|---|---|
1 | 50,000 |
2 | 55,000 |
3 | 65,000 |
4 | 80,000 |
5 | 110,000 |
6 | 150,000 |
7 | 200,000 |
Now, let's apply polynomial regression to model the relationship between years of experience and salary. We'll use a quadratic polynomial (degree 2) which includes both linear and quadratic terms for better fit. The quadratic polynomial regression equation is:
To find the coefficients that minimize the difference between the predicted and actual salaries, we can use the Least Squares method. The objective is to minimize the sum of squared differences between the predicted salaries and the actual data points which allows us to fit a model that captures the non-linear progression of salary with respect to experience.
Here we will see how to implement polynomial regression using Python.
We'll using Pandas, NumPy, Matplotlib and Sckit-Learn libraries and a random dataset for the analysis of Polynomial Regression which you can download from here.
Here we will load the dataset and print it for our understanding.
Output:
Our feature variable that is X will contain the Column between 1st and the target variable that is y will contain the 2nd column.
We will first fit a simple linear regression model to the data.
Output:
Now we will apply polynomial regression by adding polynomial terms to the feature space. In this example, we use a polynomial of degree 4.
Visualize the results of the linear regression model by plotting the data points and the regression line.
Output:
A scatter plot of the feature and target variable with the linear regression line fitted to the data.
Now visualize the polynomial regression results by plotting the data points and the polynomial curve.
Output:
A scatter plot of the feature and target variable with the polynomial regression curve fitted to the data.
To predict new values using both linear and polynomial regression we need to ensure the input variable is in a 2D array format.
Output:
array([0.20675333])
Output:
array([0.43295877])
In polynomial regression, overfitting happens when the model is too complex and fits the training data too closely helps in making it perform poorly on new data. To avoid this, we use techniques like Lasso and Ridge regression which helps to simplify the model by limiting the size of the coefficients.
On the other hand, underfitting occurs when the model is too simple to capture the real patterns in the data. This usually happens with a low-degree polynomial. The key is to choose the right polynomial degree to ensure the model is neither too complex nor too simple which helps it work well on both the training data and new data.
Bias Vs Variance Tradeoff helps us avoid both overfitting and underfitting by selecting the appropriate polynomial degree. As we increase the polynomial degree, the model fits the training data better but after a certain point, it starts to overfit. This is visible when the gap between training and validation errors begins to widen. The goal is to choose a polynomial degree where the model captures the data patterns without becoming too complex which ensures a good generalization.
By mastering polynomial regression, we can better model complex data patterns which leads to more accurate predictions and valuable insights across various fields.