![]() |
VOOZH | about |
The Shapiro-Wilk's test or Shapiro test is a normality test in frequentist statistics. The null hypothesis of Shapiro's test is that the population is distributed normally. It is among the three tests for normality designed for detecting all kinds of departure from normality. If the value of p is equal to or less than 0.05, then the hypothesis of normality will be rejected by the Shapiro test. On failing, the test can state that the data will not fit the distribution normally with 95% confidence. However, on passing, the test can state that there exists no significant departure from normality. This test can be done very easily in R programming.
Suppose a sample, say x1,x2.......xn, has come from a normally distributed population. Then according to the Shapiro-Wilk's tests null hypothesis test
where,
- x(i) : it is the ith smallest number in the given sample.
- mean(x) : ( x1+x2+......+xn) / n i.e the sample mean.
- ai : coefficient that can be calculated as (a1,a2,....,an) = (mT V-1)/C . Here V is the covariance matrix, m and C are the vector norms that can be calculated as C= || V-1 m || and m = (m1, m2,......, mn ).
To perform the Shapiro Wilk Test, R provides shapiro.test() function.
Syntax:
shapiro.test(x)
Parameter:
x : a numeric vector containing the data values. It allows missing values but the number of missing values should be of the range 3 to 5000.
Let us see how to perform the Shapiro Wilk's test step by step.
install.packages("dplyr")
One can also create their own data set. For that first prepare the data, then save the file and then import the data set into the script. The file can include using the following syntax:
data <- read.delim(file.choose()) ,if the format of the file is .txt data <- read.csv(file.choose()), if the format of the file is .csv
Output:
len supp dose 1 11.2 VC 0.5 2 8.2 OJ 0.5 3 10.0 OJ 0.5 4 27.3 OJ 2.0 5 14.5 OJ 1.0 6 26.4 OJ 2.0 7 4.2 VC 0.5 8 15.2 VC 1.0 9 14.5 OJ 0.5 10 26.7 VC 2.0
Output:
> dplyr::sample_n(my_data, 10) len supp dose 1 11.2 VC 0.5 2 8.2 OJ 0.5 3 10.0 OJ 0.5 4 27.3 OJ 2.0 5 14.5 OJ 1.0 6 26.4 OJ 2.0 7 4.2 VC 0.5 8 15.2 VC 1.0 9 14.5 OJ 0.5 10 26.7 VC 2.0 > shapiro.test(my_data$len) Shapiro-Wilk normality test data: my_data$len W = 0.96743, p-value = 0.1091
From the output obtained we can assume normality. The p-value is greater than 0.05. Hence, the distribution of the given data is not different from normal distribution significantly.