Perform the one-sample Kolmogorov-Smirnov test by using Show
load examgrades; x = grades(:,1); 3. Confirm the test decision by visually comparing the empirical cumulative distribution function (cdf) to the standard normal cdf. Load the load examgrades; x = grades(:,1); 4 data set. Create a vector containing the first column of the exam grade data. load examgrades test1 = grades(:,1); Test the null hypothesis that the data comes from a normal distribution with a mean of 75 and a standard deviation of 10. Use these parameters to center and scale each element of the data vector, because load examgrades; x = grades(:,1); 3 tests for a standard normal distribution by default. x = (test1-75)/10; h = kstest(x) The returned value of load examgrades; x = grades(:,1); 6 indicates that load examgrades; x = grades(:,1); 3 fails to reject the null hypothesis at the default 5% significance level. Plot the empirical cdf and the standard normal cdf for a visual comparison. cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') The figure shows the similarity between the empirical cdf of the centered and scaled data vector and the cdf of the standard normal distribution. Specify the Hypothesized Distribution Using a Two-Column MatrixLoad the sample data. Create a vector containing the first column of the students’ exam grades data. load examgrades; x = grades(:,1); Specify the hypothesized distribution as a two-column matrix. Column 1 contains the data vector x = (test1-75)/10; h = kstest(x) 6. Column 2 contains cdf values evaluated at each value in x = (test1-75)/10; h = kstest(x) 6 for a hypothesized Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. test_cdf = [x,cdf('tlocationscale',x,75,10,1)]; Test if the data are from the hypothesized distribution. h = kstest(x,'CDF',test_cdf) The returned value of test_cdf = [x,cdf('tlocationscale',x,75,10,1)]; 0 indicates that load examgrades; x = grades(:,1); 3 rejects the null hypothesis at the default 5% significance level. Specify the Hypothesized Distribution Using a Probability Distribution ObjectLoad the sample data. Create a vector containing the first column of the students’ exam grades data. load examgrades; x = grades(:,1); Create a probability distribution object to test if the data comes from a Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); Test the null hypothesis that the data comes from the hypothesized distribution. h = kstest(x,'CDF',test_cdf) The returned value of test_cdf = [x,cdf('tlocationscale',x,75,10,1)]; 0 indicates that load examgrades; x = grades(:,1); 3 rejects the null hypothesis at the default 5% significance level. Test the Hypothesis at Different Significance LevelsLoad the sample data. Create a vector containing the first column of the students’ exam grades. load examgrades; x = grades(:,1); Create a probability distribution object to test if the data comes from a Student’s t distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom. test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); Test the null hypothesis that data comes from the hypothesized distribution at the 1% significance level. x = (test1-75)/10; h = kstest(x) 1 The returned value of test_cdf = [x,cdf('tlocationscale',x,75,10,1)]; 0 indicates that load examgrades; x = grades(:,1); 3 rejects the null hypothesis at the 1% significance level. Conduct a One-Sided Hypothesis TestLoad the sample data. Create a vector containing the third column of the stock return data matrix. x = (test1-75)/10; h = kstest(x) 2 Test the null hypothesis that the data comes from a standard normal distribution, against the alternative hypothesis that the population cdf of the data is larger than the standard normal cdf. x = (test1-75)/10; h = kstest(x) 3 The returned value of test_cdf = [x,cdf('tlocationscale',x,75,10,1)]; 0 indicates that load examgrades; x = grades(:,1); 3 rejects the null hypothesis in favor of the alternative hypothesis at the default 5% significance level. Plot the empirical cdf and the standard normal cdf for a visual comparison. x = (test1-75)/10; h = kstest(x) 4 The plot shows the difference between the empirical cdf of the data vector x = (test1-75)/10; h = kstest(x) 6 and the cdf of the standard normal distribution. Input Argumentscollapse all x = (test1-75)/10; h = kstest(x) 6 — Sample data vector Sample data, specified as a vector. Data Types: h = kstest(x,'CDF',test_cdf) 0 | h = kstest(x,'CDF',test_cdf) 1 Name-Value ArgumentsSpecify optional pairs of arguments as h = kstest(x,'CDF',test_cdf) 2, where h = kstest(x,'CDF',test_cdf) 3 is the argument name and h = kstest(x,'CDF',test_cdf) 4 is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose h = kstest(x,'CDF',test_cdf) 3 in quotes. Example: h = kstest(x,'CDF',test_cdf) 6 specifies a test using the alternative hypothesis that the cdf of the population from which the sample data is drawn is greater than the cdf of the hypothesized distribution, conducted at the 1% significance level. Significance level of the hypothesis test, specified as the comma-separated pair consisting of h = kstest(x,'CDF',test_cdf) 7 and a scalar value in the range (0,1). Example: h = kstest(x,'CDF',test_cdf) 8 Data Types: h = kstest(x,'CDF',test_cdf) 0 | h = kstest(x,'CDF',test_cdf) 1 load examgrades; x = grades(:,1); 1 — cdf of hypothesized continuous distribution matrix | probability distribution object cdf of hypothesized continuous distribution, specified the comma-separated pair consisting of load examgrades; x = grades(:,1); 2 and either a two-column matrix or a continuous probability distribution object. When load examgrades; x = grades(:,1); 1 is a matrix, column 1 contains a set of possible x values, and column 2 contains the corresponding hypothesized cumulative distribution function values G(x). The calculation is most efficient if load examgrades; x = grades(:,1); 1 is specified such that column 1 contains the values in the data vector . If there are values in x = (test1-75)/10; h = kstest(x) 6 not found in column 1 of load examgrades; x = grades(:,1); 1, load examgrades; x = grades(:,1); 3 approximates G(x) by interpolation. All values in x = (test1-75)/10; h = kstest(x) 6 must lie in the interval between the smallest and largest values in the first column of load examgrades; x = grades(:,1); 1. By default, load examgrades; x = grades(:,1); 3 tests for a standard normal distribution. The is only valid for continuous cumulative distribution functions, and requires load examgrades; x = grades(:,1); 1 to be predetermined. The result is not accurate if load examgrades; x = grades(:,1); 1 is estimated from the data. To test x = (test1-75)/10; h = kstest(x) 6 against the normal, lognormal, extreme value, Weibull, or exponential distribution without specifying distribution parameters, use test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); 5 instead. Data Types: h = kstest(x,'CDF',test_cdf) 0 | h = kstest(x,'CDF',test_cdf) 1 test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); 8 — Type of alternative hypothesis test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); 9 (default) | h = kstest(x,'CDF',test_cdf) 0 | h = kstest(x,'CDF',test_cdf) 1 Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of h = kstest(x,'CDF',test_cdf) 2 and one of the following. test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1); 9Test the alternative hypothesis that the cdf of the population from which is drawn is not equal to the cdf of the hypothesized distribution. h = kstest(x,'CDF',test_cdf) 0Test the alternative hypothesis that the cdf of the population from which x = (test1-75)/10; h = kstest(x) 6 is drawn is greater than the cdf of the hypothesized distribution. h = kstest(x,'CDF',test_cdf) 1Test the alternative hypothesis that the cdf of the population from which x = (test1-75)/10; h = kstest(x) 6 is drawn is less than the cdf of the hypothesized distribution. If the values in the data vector x = (test1-75)/10; h = kstest(x) 6 tend to be larger than expected from the hypothesized distribution, the empirical distribution function of x = (test1-75)/10; h = kstest(x) 6 tends to be smaller, and vice versa. Example: load examgrades; x = grades(:,1); 1 Output Argumentscollapse all x = (test1-75)/10; h = kstest(x) 8 — Hypothesis test result x = (test1-75)/10; h = kstest(x) 9 | cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 0 Hypothesis test result, returned as a logical value.
p-value of the test, returned as a scalar value in the range [0,1]. cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 5 is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values of cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 5 cast doubt on the validity of the null hypothesis. cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 9 — Test statistic nonnegative scalar value Test statistic of the hypothesis test, returned as a nonnegative scalar value. load examgrades; x = grades(:,1); 0 — Critical value nonnegative scalar value Critical value, returned as a nonnegative scalar value. More Aboutcollapse all One-Sample Kolmogorov-Smirnov TestThe one-sample Kolmogorov-Smirnov test is a nonparametric test of the null hypothesis that the population cdf of the data is equal to the hypothesized cdf. The two-sided test for “unequal” cdf functions tests the null hypothesis against the alternative that the population cdf of the data is not equal to the hypothesized cdf. The test statistic is the maximum absolute difference between the empirical cdf calculated from x and the hypothesized cdf: where F^(x) is the empirical cdf and G(x) is the cdf of the hypothesized distribution. The one-sided test for a “larger” cdf function tests the null hypothesis against the alternative that the population cdf of the data is greater than the hypothesized cdf. The test statistic is the maximum amount by which the empirical cdf calculated from x exceeds the hypothesized cdf: The one-sided test for a “smaller” cdf function tests the null hypothesis against the alternative that the population cdf of the data is less than the hypothesized cdf. The test statistic is the maximum amount by which the hypothesized cdf exceeds the empirical cdf calculated from x: load examgrades; x = grades(:,1); 3 computes the critical value load examgrades; x = grades(:,1); 0 using an approximate formula or by interpolation in a table. The formula and table cover the range x = (test1-75)/10; h = kstest(x) 07 ≤ x = (test1-75)/10; h = kstest(x) 08 ≤ x = (test1-75)/10; h = kstest(x) 09 for two-sided tests and x = (test1-75)/10; h = kstest(x) 10 ≤ x = (test1-75)/10; h = kstest(x) 08 ≤ x = (test1-75)/10; h = kstest(x) 12 for one-sided tests. load examgrades; x = grades(:,1); 0 is returned as x = (test1-75)/10; h = kstest(x) 14 if x = (test1-75)/10; h = kstest(x) 08 is outside this range. Algorithmsload examgrades; x = grades(:,1); 3 decides to reject the null hypothesis by comparing the p-value with the significance level , not by comparing the test statistic with the critical value . Since load examgrades; x = grades(:,1); 0 is approximate, comparing cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 9 with load examgrades; x = grades(:,1); 0 occasionally leads to a different conclusion than comparing cdfplot(x) hold on x_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best') 5 with load examgrades; x = grades(:,1); 7. References[1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.” Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78. [2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.” Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121. [3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.” Journal of Statistical Software. Vol. 8, Issue 18, 2003. Kolmogorov Smirnov test ใช้ทดสอบอะไรศึกษาทำการตรวจสอบโดยใช้ Kolmogorov-Smirnov Test (K-S Test) เนื่องจากกลุ่มตัวอย่างใหญ่ มากกว่า 50 กลุ่มตัวอย่าง ซึ่ง Kolmogorov-Smirnov Test (K-S Test) เป็นทดสอบที่ใช้ทดสอบการ แจกแจงของประชากรว่าเป็นแบบปกติหรือไม่ หลักการของการทดสอบนี้ คือ การเปรียบเทียบค่า |