2Steyn.qxd In many cases it is important to know whether a relationship exists between two variables, e.g. between personality type and the study field of students. Other examples are the relationship between gender and preference for or against a new medical scheme for workers, or bet ween years of experience and the motivational score of personnel of a tertiary institution. Usually the statistical significance of such relationships are determined, which means that the (null) hypothesis of no relationship is tested. Apart from the criticism of the sole use of statistical significance testing in this regard by the author (Steyn, 2000), the appropriateness of only knowing that a relationship does exist, is also in question. Actually, one wants to know whether a relationship is large enough to be important. Two situations have to be distinguished: (1) when dealing with a population and (2) when a random sample is drawn from a population. Only in the second situation is the statistical significance of a relationship appropriate, since the test result obtained from the sample is used to establish whether two variables are related within the population (with a small probability of concluding this erroneously). In the first situation another way has to be found to determine whether the relationship is “practically significant”. Here, as in the case where two population means are compared (cf. Steyn, 2000), an effect size, as a measure of practical significance, can be a useful aid. Also, such an effect size can be established from a sample in order to determine the importance of a statistically significant relationship. Many different effect sizes exist and are discussed in Psychological literature (see Nickerson, 2000). While the reporting of effect sizes is encouraged by the American Psychological Association (APA) in their Publication Manual (4thth edition, APA, 1994), Kirk (1996) noted on the basis of a survey of four APA journals, that most of these measures are seldom if ever found in published reports. The reporting of effect sizes has the added attraction to some analysts of facilitating the use of meta-analytic techniques (see Rosenthal, 1991). Different kinds of relationships exist, that depend on the scales on which the two variables are measured. In this paper the following cases are dealt with: � Both variables on a nominal scale; � Both variables dichotomous; � One variable dichotomous, the other on an interval/ratio scale; � Both variables on an interval/ratio scale. In the following section, an overview is given of population effect sizes for each of the cases. The second section deals with the estimation of effect sizes by using random samples. Examples are given throughout the two sections, and the last section contains a discussion and conclusions of how to apply practical significance. POPULATION EFFECT SIZES OF RELATIONSHIPS Both Variables On A Nominal Scale Consider the following example: Example 1: In order to study the relationship between temperament type and grouping of faculty members and students at a tertiary education institution, the Myers-Briggs Type Indicator (MBTI) was administered to all the lecturers and students of an Economics and Management Facult y at a South African university (Rothmann et al., 2000a). Table 1 gives the numbers of lecturers, male and female students within each of the four temperament types. TABLE 1 CONTINGENCY TABLE OF TEMPERAMENT TYPE (X) BY GROUPS OF LECTURERS AND STUDENTS (Y) Temperament Type Lecturers Male Female Total Students Students Sensing – Judgement 20 57 79 156 Sensing – Reception 0 29 23 52 Intuition – Thinking 5 23 19 47 Intuition – Feeling 3 12 12 27 Total 28 121 133 282 Let the two nominal variables x and y be classified in a two-way frequency (contingency) table as in Table 1. Let x have r categories given as the different rows of the table, and y have c categories as the columns. Further, let fij be the frequency of the HS STEYN (JR) Statistical Consultation Ser vice Potchefstroom University for Christian Higher Education ABSTRACT It is shown how effect sizes can be used to establish whether relationships between two variables are practically significant (important). This is done for populations as well as for samples. Four cases are distinguished: When both variables are nominal, both dichotomous, one dichotomous and the other on an interval scale and lastly both variables on an interval scale. Examples are given to illustrate the use of the suggested effect sizes. OPSOMMING Daar word aangetoon hoe effekgroottes gebruik kan word om te bepaal of verbande tussen twee veranderlikes prakties betekenisvol (belangrik) is. Dit word vir populasies sowel as vir steekproewe gedoen. Vier gevalle word onderskei: Wanneer albei veranderlikes nominaal, albei digotoom, een digotoom en die ander op ‘n intervalskaal en laastens albei veranderlikes op ‘n intervalskaal is. Voorbeelde word gegee om die gebruik van die voorgestelde effekgroottes te illustreer. PRACTICALLY SIGNIFICANT RELATIONSHIPS BETWEEN TWO VARIABLES Requests for copies should be addressed to: HS Steyn (jr), Statistical Consultation Ser vice, PU for CHE, Private Bag X6001, Potchefstroom, 2520 10 SA Journal of Industrial Psychology, 2002, 28(3), 10-15 SA Tydskrif vir Bedryfsielkunde, 2002, 28(3), 10-15 PRACTICALLY SIGNIFICANT RELATIONSHIPS 11 population elements falling within the ith category of x and the jth category of y (i.e. the frequency of the ith row and the jth column of the table). Also, denote fi+ to be the ith row’s total frequency, and letting f+j be that of the jth column. Let N be the population size, i.e. the total frequency. Cohen (1988) suggested the following effect size to measure the relationship between x and y: (1) Note that , where is the usual Chi-square statistic for this two-way frequency table. The following guidelines are given by Cohen (1988) in order to judge the importance of a relationship: w = 0,1: small effect. w = 0,3: medium effect w = 0,5: large effect Cohen justifies his guidelines for w by giving the equivalent values of the contingency coefficient and Cramér’s �1. In the following section examples are given of 2x2 contingency tables for each of these guidelines. Example 1: (continued) Consider Table 1. To calculate the effect size w, it is necessary to obtain the cell values of every cell in the contingency table: The cell value of the cell in the ith row and jth column is given by: For the top-left cell (i.e. first row, first column) Each cell’s value can be calculated in the same way, resulting in: w2 = 0,0047 + 0,0052 + 0,0014 + 0,0183 + 0,0071 + 0,0003 + 0,0021 + 0,0014 + 0,0016 + 0,0001 + 0,0001 + 0,0002, and This value of w indicates that the effect is small to medium. Therefore there is some indication of a relationship between the temperament type and the grouping of faculty members and students into categories. Both Variables Dichotomous Consider first the following example: Example 2: A survey of an organisation’s 60 employees regarding their preferences for a new medical scheme, resulted in the frequency table given by Table 2. TABLE 2 A 2X2 TABLE OF GENDER BY PREFERENCE FOR A MEDICAL SCHEME Gender Male Female Preference New 24 14 38 Old 16 6 22 40 20 60 Is there a relationship between gender and preference? Cohen (1988) suggested as effect size the so-called phi coefficient �, which is a special case of the effect size w, when r = 2 and c = 2. However, a simpler formula for the calculation of � can be used when the frequencies in the 2x2 table is given by a, b, c and d in the following way: y Category 1 Category 2 x Category 1 a b a + b Category 2 c d c + d a + c b + d N Now we have (2) This effect size can also be negative when bc > ad, implying that the frequencies b and c are more abundant than the other two cell frequencies. Therefore, in contrast to the case where more than two categories (levels) occur on one or both variables, the direction of the relationship can also be determined. Since the phi coefficient is a special case of w in (1), the same guidelines for this effect size can be used (without taking the sign of � into consideration). Example 2: (continued) Considering the data in Table 2 and using (2) we have: Since � is almost 0,1 in absolute value, it can be considered as a small effect. No relationship really exists and the negative sign is therefore of little importance. To get a feeling of what “small”, “medium” and “large” effects mean in terms of 2x2 tables, consider Table 3 in which a population of size 200 has been grouped. For a 2x2 table to describe a positive relationship, the frequencies in the cells where x and y have the same value (e.g. both 1 and both 2) have to be larger than those of the remaining two cells. In Table 3 (a) above these frequencies are both 55 in contrast to the 45 of the other cells, resulting in a effect size of 0,1. In Table 3 (b) and (c) these frequencies increase relative to those of the two remaining cells and therefore the value of � also increases. Analogous illustrations of negative relationships can be given by making the frequencies of cells where x and y are different, larger. In such cases the values of � will be negative. The value of � will be zero when frequencies in two rows (or columns) are equal. × × ϕ = = = × × × 24 6 – 16 14 144 – 224 – 0,098 38 22 40 20 668800 = ϕ = + + + + – ( ) ( ) ( ) ( ) ad bc w a b c d a c b d = = 0,0411 0,203 w ( ) ( ) 2 11 1 1 1 1 2 – 20 – 156 28/ 282 0,0047 156 28 f f f N f f + + + + × = = × ( )2– /ij i j i j f f f N f f + + + + 2 X 2 /X Nw= ∑ ∑ 2 )Njf – f fr C +ij i+ w= f fi j i+ +j ( =1 =1 STEYN (jr)12 TABLE 3 EXAMPLES OF 2X2 CONTINGENCY TABLES FOR DIFFERENT VALUES OF � (a) � = 0,1 y Category 1 Category 2 x Category 1 55 45 100 Category 2 45 55 100 100 100 200 (b) � = 0,3 y Category 1 Category 2 x Category 1 65 35 100 Category 2 35 65 100 100 100 200 (b) � = 0,5 y Category 1 Category 2 x Category 1 75 25 100 Category 2 25 75 100 100 100 200 Also keep in mind that the maximum absolute value of � when dealing with 2x2 tables is 1 (which is the case when either b=c=0, resulting in � ���, or a = d = 0, resulting in � = –1). One Variable Dichotomous, The Other On An Interval/Ratio Scale Example 3: In the study described in example 1 (Rothmann, et al., 2000a) the means of the continuous personality type scores of the lecturers were compared with those of the students (see Table 4). TABLE 4 THE MEANS AND STANDARD DEVIATIONS OF PERSONALITY TYPES PER SUB-POPULATION Item Lecturers (N = 25) Students (N = 254) � � � � Extraversion – Introversion (E/I) 107,64 25,06 94,58 25,15 Sensing – Intuition (S/N) 84,57 27,60 86,65 20,58 Thinking – Feeling (T/F) 82,64 22,47 86,79 21,66 Judgement – Perception (J/P) 70,07 25,93 91,08 28,60 �: mean of population � standard deviation of population Here a dichotomous variable (x) can be considered an indicator of membership of population members to two distinct groups or sub-populations (in example 3 it is the lecturers and students). The usual measure for a relationship between such a variable x and one on an interval or ratio scale (y) is the point- biserial correlation �pb. It can be calculated by taking x as a variable with two distinct numerical values (e.g. 0 and 1) and obtaining the Pearson product moment correlation coefficient between x and y. Take the effect size of the difference between two population means �1 and �2 to be (Cohen, 1988): (3) where � is the common standard deviation of the t wo populations. The relationship between �pb and � is given by: (4) with p the proportion of the population members belonging to the first population and q = 1 – p the remaining proportion. Steyn (2000) suggested that when dealing with populations with different standard deviations �� and � that the following effect size for a difference in population means should rather be used: (5) It can be shown that the same relationship as in (4) exists between �pb and the newly defined �a. Using this relationship, guideline values for � (Cohen, 1988) of 0,2 (small effect), 0,5 (medium effect) and 0,8 (large effect) transform to values 0,1, 0,243 and 0,371 for �pb. For convenience the following guideline values are therefore suggested for �pb: � small effect : 0,1 � medium effect : 0,25 � large effect : 0,4 Example 3: (continued) From Table 4 the effect sizes in respect of the relationship between the personality type scores and the sub-population membership can be calculated as in Table 5. TABLE 5 CALCULATIONS FROM TABLE 4 RESULTS Item Effect E/I 13,06 632,07 0,519 0,154 Small S/N – 2,08 457,12 – 0,097 0,029 Small T/F – 4,15 472,70 – 0,191 0,057 Small J/P – 21,01 803,50 – 0,741 0,216 Medium Both Variables On An Interval/Ratio Scale In this case the Pearson product moment correlation coefficient (�) is the appropriate measure of a relationship. However, it only measures linear relationships, i.e. when both x and y values are displayed in a scatter plot, the spread of points can be best described by a straight line. Let both variables x and y be assumed to be normally distributed. Also let the variable z be x when dichotomised with values at the medians of the lower half and upper half of the x values. Now the following relationship between the correlations of y and x and y and z exist (Cohen, 1988): (6) From (6) it follows that the guideline values from the previous section for �pb (which were derived from those of �), now transform to the following rounded values in respect of � (Cohen, 1988): � small effect : 0,1 � medium effect : 0,3 � large effect : 0,5 Note that � can be negative, reflecting an inverse relationship between x and y, but to decide upon the effect, we use the absolute value of �. ρ = ρ( , ) 1,253 ( , )pbx y z y ρ pb∆aσ + σ 2 2 1 2p qµ µ1 2– µ µ ∆ = σ + σ 1 2 2 1 2 – a p q ( )( )ρ = ∆ ∆ +2 2 2/ 1/pb pq µ µ ∆ = σ 1 2– PRACTICALLY SIGNIFICANT RELATIONSHIPS 13 Example 4: Rothmann et al. (2000b) conducted a st udy where the personality preferences of all the pharmacy students at a tertiary education institution were obtained to relate them to academic performance. Table 6 contains the Pearson correlations bet ween the academic performance and the continuous scores of the personalit y construct extraversion/introversion for a core group of the students per academic year and gender (i.e. students who passed all their subjects the previous year and who were registered for all the prescribed subjects of the current year). TABLE 6 CORRELATION BETWEEN ACADEMIC PERFORMANCE AND EXTRAVERSION/INTROVERSION PERSONALITY CONSTRUCT Year 1 2 3 4 � (males) 0,23* 0,13 –0,05 0,47** � (females) 0,24* 0,15 0,20 0,34* * medium effect ** large effect Note that since the guidelines are somewhat arbitrary, the correlations 0,23 and 0,24 are viewed to have a medium effect, being nearer to 0,3 than to 0,1. According to Cohen (1988) “… many of the correlation coefficients encountered in behavioural science are of this order of magnitude, and, indeed, this degree of relationship would be perceptible to the naked eye of a reasonably sensitive observer.” Since 0,47 is near 0,5 it can be taken to be a large effect. Here it falls around the upper end of the range of r’s one encounters in fields like differential, personalit y-social, personnel, educational, clinical and counselling psychology (Cohen, 1988). The Estimation Of Effect Sizes Of Relationships From Samples In the previous section we gave the appropriate effect sizes for establishing the importance of a relationship between two variables for a complete population. When dealing with a random sample of size n from such a population, the effect sizes can no longer be determined exactly, but can be estimated from the results of the sample. In this section these estimates are given together with their statistical properties as far as unbiasedness is concerned. Two Categorical Variables By using the cell frequencies of a contingency table in respect of a sample, w can be estimated by , using formula (1). From Johnson et al. (1995, p.447) it follows that the expected value of is approximately . This means that overestimates by . Where n is large, this bias term can be neglected and it follows that is virtually unbiased for w. Note that in order to establish this unbiasedness, the condition that every cell frequency must be above 5 must be met. This is the usual condition under which the Chi-square test on a contingency table is applicable. Example 5: (Elifson et al., 1990, p.422). Interviews were conducted with 70 homosexual and 110 heterosexual males concerning their fear of contracting AIDS. Assume that these respondents were randomly chosen from some specified population. Table 7 gives a 3x2 contingency table of the results (with the expected frequencies when assuming no relationship in brackets). TABLE 7 CONTINGENCY TABLE OF FEAR FOR AIDS BY SEXUAL ORIENTATION Sexual Orientation Total Homosexual Heterosexual Fear of AIDS Great 40(21,4) 15(33,6) 55 Moderate 20(21,4) 35(33,6) 55 Low 10(27,2) 60(42,8) 70 Total 70 110 180 Firstly the hypothesis of no relationship was statistically tested and found to be highly significant. Here = 44,48/180 = 0,247 with approximate bias , which is negligible. The effect size is = 0,497 which indicates an important relationship between sexual orientation and fear of AIDS. Two Dichotomous Variables As in the previous section, the population effect size � can be estimated by �, using the cell frequencies from a contingency table of a sample in formula (2). Also since is a special case of , this estimation of � is unbiased for large n. Even for n = 20 Monte Carlo simulations (with 10 000 replications) showed a bias of about 0,02 when data were generated from Table 3(c) where ���� ��. (See Steyn, 1999 for more details). For ���� � (as in Table 3(b)) the bias was even smaller. Example 6: (Larsen & Marx, 1981, p.337): Over the years studies have sought to characterise nightmare sufferers. To investigate whether men fall into this pattern to the same extent as women, random samples of 160 men and 192 women were drawn, resulting in Table 8. TABLE 8 CONTINGENCY OF NIGHTMARE PATTERN BY GENDER Men Women Total Nightmares often 55 60 115 Nightmares seldom 105 132 237 Total 160 192 352 Let the null-hypothesis be that no relationship exists between nightmare pattern and gender, then the usual Chi-square-test yields no statistically significant result. . Hence the relationship between nightmare pattern and gender would only be due to chance. A small effect size can therefore be assumed. For completeness sake, the phi coefficient was calculated in this case ( = 0,033). One Variable Dichotomous, The Other An Interval Scale In order to estimate �pb from a random sample of the population, it suffices to estimate �a in (5). Steyn (2000) suggested the estimator ϕ̂  = =  2 1 0,394; 0.53X p ŵ ϕ̂ ŵ= (3 – 1) (2 – 1) 0,01 180 2 ŵ  = <  2 (2) 44,48; 0,001X p ŵ ( – 1) ( – 1)r c n 2 w 2 ŵ+ 2 ( – 1) ( – 1)ˆ r c w n 2 ŵ ŵ ^ STEYN (jr)14 (7) where and are the two sample means and Smax is the maximum of the t wo standard deviations; slightly underestimates �a. The estimator for �pb follows from (4): (8) Example 7: Rothmann (1999) conducted a study to test a programme which improved participants’ knowledge of facilitation. He assigned half of a group of third year volunteer students randomly to an experimental group who took the programme. The remainder of the volunteers were used as a control group. Before the programme a facilitation test was administered to all 48 students and after the intervention to 44 of them. The increase in scores between the pre- and post-tests gives the results in Table 9. TABLE 9 MEAN INCREASE BETWEEN PRE- AND POST TESTS AND STANDARD DEVIATIONS Experimental (n = 24) Control (n = 20) s s 18,95 6,26 2,18 3,45 Testing the null-hypothesis of no difference between the test and control means, resulting in a highly significant difference in means [t = 11,24; p < 0,0001]. Since the population studied can be viewed to be the 48 volunteers from which the two groups were randomly chosen, the proportions p and q can be taken to be equal. First estimate �a by: The effect size can be estimated to be 0,80 which is very large and indicates an important relationship bet ween group membership and increase in knowledge of facilitation. The programme was therefore highly successful. Both Variables On An Interval Scale The natural estimator for � is the product moment correlation coefficient r, based on a random sample from the population. According to Johnson et al. (1995, p.55) r is a biased estimator for �, with bias which is always between – 0,2/n and 0,2/n. This means that for large samples r is unbiased but for smaller samples it underestimates � whenever � is positive. When � is negative it overestimates �. Keeping this in mind, it is suggested that r be used. Therefore, for small samples and a positive correlation the effect size estimator based on r will be conservative in the sense that a practically significant relationship will not always be detected in cases where it really exists. The opposite is true for negative correlations. Example 8: (Adapted from Bartholomew and Knot, 1999, p.69). Pearson correlation coefficients were obtained for six ability variables from a random sample of 112 individuals (see Table 10). TABLE 10 CORRELATION COEFFICIENTS OF ABILITY VARIABLES Ability 1 2 3 4 5 1. Non-verbal intelligence 2. Picture completion 0,466** 3. Block design 0,552** 0,572** 4. Mazes 0,340* 0,193 0,445** 5. Reading comprehension 0,576** 0,263* 0,354* 0,184 6. Vocabulary 0,510** 0,239* 0,356* 0,219* 0,794** ** large effect * medium effect With the exception of the correlation bet ween reading comprehension and mazes, all the correlations are statistically significant at the 5% level of significance. This means that the null-hypothesis of no correlation is rejected. Clearly, not all these correlations indicate important relationships, and in viewing the correlations as estimates of effect sizes, e.g. the correlation between Non-verbal intelligence on the one hand and block design, reading comprehension and vocabulary on the other hand, have large effects. DISCUSSION AND CONCLUSIONS Measures of relationships like the phi coefficient and Pearson correlation coefficient are well known. However, their usage as measures of effect size is less known. In this paper it was shown how effect sizes w and �pb also have their place in this regard. Apart from Steyn (1999,2000), a clear distinction between population and sample cases of effect sizes is rarely made. The author tried to make this distinction in the current paper and illustrated it by an abundance of examples. While effect sizes are suggested for each of four cases, for relationships when dealing with a complete population, the estimates from random samples are not always unbiased. Especially with small samples, biased estimations can occur and care should be taken when drawing conclusions regarding the size of the effect. While many other types of effect sizes exist (Nickerson, 2000; Cohen, 1988; Steyn, 2000), the focus in this paper was on effect sizes which arise from relationships. There are also effect sizes when comparing several means in respect of one or more variables (Steyn, 1999), and are topics for further research. Acknowledgement: The author is indebted to the referees for their valuable comments and suggestions. REFERENCES American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author. Bartholomew, D.J. and Knott, M. (1999) Latent variables models and factor analysis. Second edition. Arnold, London. Cohen, J (1988). Statistical power analysis for the behavioural sciences. Second edition. Hillsdale, NJ: Lawrence Erlbaum Associates. Johnson, N.L., Kotz, S & Balakrishnan, N. (1995). Continuous univariate distributions. Volume 2. New York: John Wiley. Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746-759. ρ ρ21 2 – (1 – ) / n ∆ = = = ρ = ∆ ∆ + = = 1 2 max 2 ( – ) / (18,95 – 2,18) / 6,26 2,68 ˆˆ / 4 2,68 / 3,34 0,80 a pb a a x x S xx ρ = ∆ ∆ +2ˆ ˆˆ / 1 /( )pb a a pq ∆̂a 2x1x ∆ = 1 2 max –ˆ a x x s PRACTICALLY SIGNIFICANT RELATIONSHIPS 15 Larsen, R.J. and Marx, M.L. (1981). An introduction to Mathematical Statistics and its applications. Englewood Cliffs: Prentice-Hall. Nickerson, R.S. (2000). Null Hypotheses Significance Testing: A Review of an Old and Continuing Controversy. Psychological Methods, 5 (2), 241-301. Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park: Calif. Sage Publications. Rothmann, S, Basson, W.D., Rothmann, J.C. (2000b). The personality preferences of pharmacy students and lecturers at a tertiary education institution: International Journal of Pharmacy Practice, 8, 225-233. Rothmann, S, Coetzee, S.C., Fouche, W. & Theron, N. (2000a). The personality preferences of business lecturers and students at a tertiary education institution: Management Dynamics 9(1), 60-86. Rothmann, S. (1999). The evaluation of training programmes in facilitation at a tertiary education institution. Accepted for publication: Management Dynamics, 8 (4), 33-50 . Steyn, H.S.(jr) (1999). Praktiese beduidendheid: die gebruik van effekgroottes. Wetenskaplike bydraes, Reeks B: Natuur wetenskappe nr. 117, Potchefstroomse Universiteit vir CHO, Potchefstroom. Steyn , H.S.(jr)(2000). Practical significance of the difference in means. Accepted for publication: Journal of Industrial Psychology, 26(3), 1-3.