Upsala J Med Sci 85: 97-102, 1980 Significance, Importance and Equality- Three Basic Concepts in the Analysis of a Difference Adam Taube Department of Statistics, University of Uppsala, Uppsala, Sweden ABSTFACT By means of data from fictitious cross over trials, it is first demonstrated that a statistically significant difference is not necessarily of a practically important order of magnitude. This fact is of special interest when the number of observations is large. Second, a statistically non significant difference does not prove the hypothesis about equality between, say, treatment effects. This fact is of special interest when the number of observations is small. For investigating whether equality is plausible, confidence intervals are more use- ful than non significant results from tests of significance. INTRODUCTION The purpose with this little note is to give an illustration of how elemen- tary statistical methods can be applied in order to answer two principally dif- ferent questions about a comparison between means or between proportions. The first question is whether a difference is statistically significant and the second whether it is of an important order of magnitude. If both these questions lead to negative answers, it is natural to ask whether it can be said that the two groups (treatments etc) being compared, are equal. The concepts "signifi- cance", "importance" and "equality" relate to three different aspects of a com- parison. These aspects are all of interest both in observational studies and in experimental situations. MATERIAL AND METHODS As an example, we consider a cross over trial, where two drugs, A and B, are being compared. For simplicity, we assume that the results concerning each of them can be given in the form tlimprovementtt or "no improvement". Thus, the data are of the kind illustrated in Table 1. 7-802858 97 Table 1. Frequencies in a cross over trial. Drug A No improvement Improvement Total Drug B No improvement a b a+b Improvement c d c+d Total a+c b+d n The standard analysis concerning the statistical significance of the effect difference, is the well known McNemar test (2) 1 d. fr 2 x2 = (b - c) /(b + c) which is based only on those individuals who give different judgements of the two drugs. If this test statistic gives a sufficiently large value, the conclu- sion is that the two treatments have different effects. The result of this test is unaffected by the magnitude of the two frequencies a and d. It is obvious that if the frequencies b and c constitute a very minor frac- tion of the total number of observations n, the result of the above test might r be of little or no interest. This can be so, even if the Xd-value i s very large, indicating a strong statistical significance. Instead, one ought to study the rates of improvement ,. A PA = (b+ d)/n and PB = (c + d)/n A and we notice that their difference, D, can be written as “ . A A D = PA - PB = (b - c)/n This is the appropriate measure Let us now define a variable of whether the difference is important o r not. xA such that fl if improvement with drug A x = j A l 0 if no improvement with drug A and similarly for ence z. = x. - xy where i = 1, 2, ..., n. If, for a certain individua1,there is no difference in effect between drug A and drug B we get better than B we get Obviously, the variable z will have the values -1, 0 and + 1 with the frequen- xB . For each of the n individuals, we can form the differ- A 1 1 z = 0. If A is z = 1 and if drug B is better than drug A we get z = -1. 98 ties c, (a + d) and b respectively, as illustrated in Fig. 1 Fig. 1. B Distribution of z = xA - x - 1 0 + 1 Z We notice that the average of the variable z is and it is easily verified that the standard deviation s is obtained from 2 s 2 = [n(b + c ) - (b - c) ]/(n - 1) n - If n is large, z can be assumed to be a normally distributed variable - irre- spective of the shape of the distribution of the variable z. Therefore, we can give a confidence interval for the true difference D by means of z 2 k SE(z) where SE(z) = s Z / A and the value of k is obtained from a table of the standard normal distribution. We denote the upper and lower confidence limits DU and DL and apply 95% confi- dence level, when calculating them for the six different data sets preserted in Table 2. h h Instead of using McNemar's test, it is possible to form the critical ratio C . R . & (i - 0 ) / s z This will give a very similar result, since the two test statistics are strong- ly related. In fact, it can easily be shown that 2 2 2 ( C . R . ) = x (n - l)/(n - x ) 99 Table 2 . Data from six fictitious trials. Frequen- Data set cies I I1 I11 IV V VI a 10 1 2 0 3 2 0 1 0 120 32U b 1 5 1 5 1 5 12 1 2 1 2 c 5 5 5 b 8 8 d 20 260 660 20 260 660 n 50 400 1000 50 400 1000 5.0* 5.0* 5.u* .8 .8 .s 2 X C . R . 2.33* 2.25* 2.24* -89 .89 .89 A y o 0 3.2% .3% .1% -9.6% -1.2% -.5% DUIOO 36.8% 4.7% 1.9% 25.6% 3.2% 1.3% * ) Significant. P < .05 DISCUSSION Wnen comparing data sets I, I1 and I11 we notice that the x -values are ex- 2 actly the same, indicating a statistically significant difference. However, the confidence interval for this difference is very wide in data set I, where n=50, while it is quite narrow and not far from the point zero in data set 111, where n = 1000, For this latter material, it can be argued that, in spite of the fact that the result is statistically significant, the difference might be without any practical importance. Indeed, there is no contradiction in this: The results from the statistical test procedure just tells us whether a difference is larger than what could be due to chance, if the so called null hypothesis is true. When the number of observations is large, even a very unimportant difference can be statistically significant n Fig. 2a. Data sets I, I1 and 111. 95 % confidence intervals for the x2 =5.0 true difference D.lOO%. ( 1 ) I I I I A ' 5 0 1 . 0 10 20 30 D . l O O e % It is essential to stress that the judgement whether a difference should be considered as being important or not, is not possible to do merely by means of statistical techniques. The statistician can present a confidence interval, but it must be up to the subject matter specialist to decide the magnitude of what should be considered as an important difference. 2 The data sets IV, V and VI all give small X -values. This means that no sta- tistically significant differences have been found. The corresponding confidence 100 intervals always contain the point zero, but this is certainly not enough in order to demonstrate that the two drugs have equal effects. We notice that in data set IVY where n = 50, the confidence interval is quite wide. n Fig. 2b. Data sets IV, V and VI. 95% confidence intervals for the true difference D*lOO%. x 2 = 0 . 8 An interesting interpretation of a confidence interval in a situation like this is that all possible values of the true difference D within the calculated limits are not in contradiction with the data. This means that even if data set IV gives quite a low y, -value, there is still a possibility that the true dif- ference could be of the magnitude 2 5 % . However, when we consider data set VI, we find a very narrow confidence interval, indicating that even as small values of the difference D as, say 1.5%, would be in contradiction with the data. 2 Until now, we have just used the confidence level 95%. For data set VI, the h h 99% confidence limits for D would be DL 100 = -.8% respectively, thus giving an interval which includes the above mentioned value of 1.5%. Apparently, the conclusion about which values of U that are possible or not, is not only depending upon the number of observations, n, but also upon the confidence level chosen. and DU 100 = +1.6% If it is desired to establish whether there is equality between the two treatments, it must be recognized that it is not possible to prove that For a finite number of observations, the confidence interval f o r D will always have a certain length, unless b = c = 0 , which is the uninteresting case, when there is no random variation involved. Therefore, the equality statement must be substituted with the condition that the true difference is smaller than a certain value, i.e. [DI < D,, where D, is tile smallest non-zero value of the difference which is of any interest to discover. If D' is specified by the sub- ject matter specialist and also the confidence level and the desired power of the test is decided upon, it is possible to calculate the necessary number of observations in order to establish "equality" in accordance with this modified meaning of the word. One gets the impression that this procedure has seldombeen applied, when it is stated in medical articles that "there is no difference" or that "the treatments are equal". D = 0. Indeed, everything in this discussion is of a very basic, elementary charac- ter. As the references (1, 3, 4, 5, 6) to this little note demonstrate, the problems touched upon are mentioned now and then in various journals. It is the 101 the author's impression that they are not sufficiently stressed in statistical text books, and it is easy to find applied scientific articles where the results from tests of significance have been interpreted incorrectly. 1. 2. 3. 4. 5. 6. REFERENCES Lutz, W. & Nimmo, I.A.: The inadequacy of statistical significance. Europ J of Clin Investigation 7:77-78, 1977. McNemar, I.: Note on the Sampling Error of the Difference between Correlated Proportions or Percentages. Psychometrika 12:153-157, 1947. Rennie, D.: Vive la diffgrence (P < 0.05). New Eng J Med 299:828-829, 1978. Spriet, A. & Bieler, D.: When can "non significantly different" treatments be considered as "equivalent"? B r J Clin Pharmacol 7:623-624, 1979. Taube, A.: Blev det inte signifikant? Slutsatser vid statistisk hypotes- pr6vning. Lakartidningen 69:65-68, 1972. (Swe) Wade, C.L. & Waterhouse, J.A.N.: Significant or Important? Br J Clin Phar- macol 4 : 411-412 , 1977. Accepted January 15, 1980 A d d r e s s for r e n r i n t e : Adam Taube University of Uppsala Department of Statistics P.O. Box 513 S-751 20 Uppsala Sweden 102