![]() |
VOOZH | about |
Studentized residual is a statistical term and it is defined as the quotient obtained by dividing a residual by its estimated standard deviation. This is a crucial technique used in the detection of outlines. Practically, one can claim that any type of observation in a dataset having a studentized residual of more than 3 (absolute value) is an outlier.
The following Python libraries should already be installed in our system:
You can install these packages on your system by using the below command on the terminal.
pip3 install pandas numpy statsmodels matplotlib
Step 1: Import the libraries.
We need to import the libraries in the program that we have installed above.
Step 2: Create a data frame.
Firstly, we are required to create a data frame. With the help of the pandas' package, we can create a data frame. The snippet is given below,
Step 3: Build a simple linear regression model.
Now we need to build a simple linear regression model of the created dataset. For fitting a simple linear regression model Python provides ols() function from statsmodels package.
Syntax:
statsmodels.api.OLS(y, x)
Parameters:
- y : It represents the variable that depends on x
- x :It represents independent variable
Example:
Step 4: Producing studentized residual.
For producing a dataFrame that would contain the studentized residuals of each observation in the dataset we can use outlier_test() function.
Syntax:
simple_regression_model.outlier_test()
This function will produce a dataFrame that would contain the studentized residuals for each observation in the dataset
Below is the complete implementation.
Output:
The output is a data frame that contains:
We can see that the studentized residual for the first observation in the dataset is -1.121201, the studentized residual for the second observation is 0.954871, and so on.
Now let us go into the visualization of the studentized residual. With the help of matplotlib we can make a plot of the predictor variable values VS the corresponding studentized residuals.
Example:
Output:
Plot.png: