jwsr-v7n2 157 The City-Country Rule: An Extension of The Rank-Size Rule Rein Taagepera Edgar Kaskla Rein Taagepera Department of Political Science University of California Irvine, CA 92697 rtaagepe@uci.edu http://hypatia.ss.uci.edu/ps/ Edgar Kaskla Department of Political Science California State University, Long Beach EKaskla@aol.com http://front.csulb.edu/politicalscience/ journal of world-systems research, vii, 2, fall 2001, 157–173 http://jwsr.ucr.edu issn 1076-156x © 2001 Rein Taagepera & Edgar Kaskla introduction This study introduces a “city-country rule” to complement the well-known rank-size rule for cities, from which it is derived. Th e city-country rule enables us to make a rough estimate of the population of the largest cities when the population of the entire country is known. It quickly tells us whether the actual city populations are large or small, compared to the world average for similarly ranked cities in countries of comparable size. Th e existing rank-size rule describes the empirical relationship between a city’s or town’s population and its ranking relative to other cities within an inter- acting geographical area. Th is regularity was fi rst noted by Auerbach (1913) and later popularized by Stewart (1947) and Zipf (1949). Hence it is often called Zipf ’s law. If R is the rank in size of a given city and P R is its population, the simplest form of the rank-size rule is R P R = constant = P l (1) where P l is the population of the largest city. When graphed on doubly logarith- mic paper, P R versus R, this equation corresponds to a straight line with slope –1. For many countries (or other interacting regions) a different slope (–n) yields a better data fit, and the corresponding generalized expression is Rⁿ P R = constant = P 1 . (2) As an empirical observation, the rule extends far beyond city sizes. It has been claimed to apply to the frequency of English words, the frequency of cita- A “city-country rule” derived from the well- known rank-size rule for cities correctly predicts the average relationship between the popula- tion of a country’s Rth-ranking city (P R ) and the total population (P) of that country: P = RP R 1n(RP R ). Th e formula applies in principle to other systems of defi ned total size, such as revenues of fi rms within industries or citations of journals within a scholarly discipline. Th e important result is that we can thus predict the popula- tions of all cities in a country, once its total population is given, and quickly tell whether the actual city populations are large or small, compared to the world average for similarly ranked cities in countries of comparable size. For countries of more than 100 million, a more elaborate formula holds. Th e largest city in the country needs a correction (primacy factor), which vanishes as the country pop- ulation increases. Actual primacy may be relative (compared to other cities) or absolute (compared to country population). abstract mailto:rtaagepe@uci.edu http://hypatia.ss.uci.edu/ps/ mailto:EKaskla@aol.com http://front.csulb.edu/politicalscience/ http://jwsr.ucr.edu Rein Taagepera & Edgar Kaskla158 The City-Country Rule 159 tion of scholarly journals, the size distribution of organisms in an ecosystem, the revenues of fi rms in an industry, and even the magnitude of earthquakes. We prefer to call it a “rule” rather than “law,” because no widely accepted theoretical foundation has been supplied, thus it remains an empirical rule rather than a scientifi c law. An extensive literature has attempted to explain the existence of such a rank- size relationship. Berry (1966) in particular has drawn upon the earlier work of Simon (1955) to suggest that the rank-size regularity is based on purely math- ematical grounds through stochastic growth models. But other works (Vining 1974, Okabe 1977) have challenged the mathematical assumptions that produce such a “steady state” model. Further approaches (Dacey 1966, Beckmann and McPherson 1970, Mulligan 1976, Beguin 1979) have focused on the assumptions that produce a continuous rank-size relationship rather than a stepped or hier- archical relationship that would be consistent with central place theory as devel- oped by Christaller (1966).1 Empirical testing has also continued on the level of individual countries and regions past and present (Berry and Kasarda 1977, Skinner 1977, Alden 1979, Asami 1986, Vining 1986) as well as for the world as a whole (Chase-Dunn 1985, Ettlinger and Archer 1987). Agreements and disagreements with the rank-size rule have been observed, as well as diffi culties in testing it, since “legal” (adminis- trative) and actual sizes of population centers diff er (Asami 1986) and national units may not always be the most suitable units in which the test should be car- ried out. Further operational diffi culties have been noted in defi ning (1) the limits of a city system and (2) the precise meaning of “primacy” (Walters 1985). Th e latter term expresses the observation that the rule at times works with all other cities except the largest one, which tends to be larger than expected. Th is “primacy factor” will be discussed in more below. Now we come to the new idea investigated in this study. Th e rank-size rule has been a rule about sizes of components (such as cities). But whenever these components add up to a defi nable total, it also has implications regarding this total. Cities and other smaller settlements do add up to the total population of the country (or other territorial unit).2 Our purpose here is to make this implicit relationship explicit and to use it in predicting population sizes of cities in a given state system. By so doing, we will expand the scope of the rank-size rule con- siderably. Th is rule is all too often dismissed as being an interesting empirical regularity without much applicability as a predictive tool. Th e new “city-country rule” does predict the sizes of all cities in an average system, on the basis of total population. Deviations from the predictions indicate that the actual system is diff erent from this average. Primacy, too, can be defi ned and measured in a more precise way when the total population of the country is taken as the base line. the model for the city-country rule Assume that the rank-size rule (Eq. 2) applies to all settlements, down to the last isolated one-person dwelling.3 By summing up all these populations, one should get the population of the entire country. Th e details are worked out in Appendix A. If n=l (Eq. 1), the following equation expresses the population P of the entire country (or other suitable territorial unit) in terms of the population P R of its R-th largest city or settlement: P = R P R 1n(R P R ). (3) For the largest city in the country (R=1), this becomes P = P 1 1n P l , (4) and for the next-largest city (R=2), it is P = 2 P 2 1n(2 P 2 ).(5) The general equation (corresponding to Eq. 2, with any slope indicator n) is more complex: 1. Th ere are only three relatively simple patterns to choose from, when one wants to have a steady decrease in size: linear, power rule (Eq. 2) and exponential. Any actual distribution of components ranked by size is bound to approach one of these three, more or less perfectly. We observe, for instance, that the seat distribution of parties in representative assemblies tends to follow an exponential pattern rather than a power rule (Taagepera 2001). Th e broad theory to explain these diff erences remains to be con- structed. 2 Th e same is the case for revenues of fi rms in an industry and many other phenom- ena. It is not the case, however, for others, such as magnitude of earthquakes. 3. Here another assumption enters: that a conceivable smallest possible size can be defi ned. Th is is the case for human settlements (one person), for words in a defi ned set (one single occurrence), and for the journals in a set of citations (one single occurrence). Th is is not the case for revenues of fi rms, where the cutoff between zero and some mini- mal revenue is arbitrary, depending on the currency used. Th us the city-country rule is based on two assumptions: that the total of the components can be defi ned, and that the components themselves consist of discrete particles of one type (persons, in the case of cities). Rein Taagepera & Edgar Kaskla160 The City-Country Rule 161 P = R P R (P R –l+l/n –Rn–l) / (l–n). (6) In the case of the largest city (R=l), this yields P = P 1 (P 1 –1+1/n–1) / (1–n), (7) and for the next-largest city (R=2), it becomes P = 2P 2 (P 2 –1+l/n – 2n–1) / (1–n).(8) We will test the city country rule with worldwide data for the largest and second-largest cities (Equations 4, 5, 7 and 8), as well as for the third-, fourth- and fi fth-ranking cities (using the analogous equations derived from Eqs. 3 and 6). Th e population of the largest city often deviates from the rank-size rule, which sets in with the next-largest city. Th erefore, the relation between the pop- ulations of a country and its second-largest city (and the third-, fourth- and fi fth- largest) is expected to give a better fi t than is the case with the largest city. We will start the test with these next-largest cities. Only if this fi rst test turns out fairly successful would there be any point in testing for the largest cities, so as to evaluate the extent of the primacy factor. testing the city-country rule In order to test the implications of the rank-size rule for the total popula- tion of countries, we collected data for 203 formally independent or eff ectively autonomous countries as given in Th e World in Figures (1988). Th e data refer to the period around 1985 and include the populations of countries and up to fi ve largest cities, with increasing gaps for the lower ranking cities, especially for countries of less than half a million people. For the fi fth-largest cities the number of cases decreases to 124. Before proceeding to the direct test, we should determine the median value of power index n (assuming that countries do follow the rank-size rule). Th is is done in the following way. Figure 1 shows the population ratio P 4 /P 2 of the fourth- and second-largest cities graphed against the population of the entire country.4 According to Eq. 2, the median ratio should be ½n. While the points are extremely scattered, the median values of P 4 /P 2 (indicated as x-es in Figure 1) are strikingly close to ½, at any country population below 300 million. Th is suggests that, on the average, the same value of n applies to countries of almost any size, and this value is n=1.0, in line with the simplest form of the rank-size rule (Eq. l). For the world as a whole (as also shown in Figure 1) a lower value of n=0.85 would apply, and the same tends to be the case for continents. In this light, what could be considered a successful test of the city-country model? If we graphed the population P 2 of the second-largest city versus the country population P, all points should fall reasonably close to the curve corre- sponding to n=l.0 (as given by Eq. 5). In order to be consistent with the fi nd- ings of Figure 1, we would wish for most data points to fall in the narrow zone delineated by the curves for n=0.9 and n=l.l (as determined by using Eq. 8). Moreover, this should be especially the case for country populations below 300 million. For the larger countries, a deviation toward lower values of n is expected in the light of Figure 1. Figure 2 shows P 2 (the size of the second-ranking city) graphed against P (the population of the country), both on logarithmic scales. Th e theoretical curves for n=0.9, 1.0 and 1.1 (based on Eq. 5 and 8) are shown.5 Th e actual data 4. Why pick the fourth- and second-largest cities? We want to avoid the largest cities, where the primacy factor enters. Beyond the fourth-largest cities, gaps in our data set develop. Within this range we want to have suffi cient contrast; hence P4/P2 is chosen. Countries at exactly 10 mil- lion—Belgium, Cameroon, Cuba, Greece, Ivory Coast and Madagascar—are labeled in Figure 1, because we will use them in a later section. Figure 1 – Population ratio of fourth- and second-largest cities as a function of the total population of the country. Crosses indicate median values for the given population range. 5. Th ese curves are not straight lines on the log-log graph, but the deviation from straight lines is almost invisible throughout the range shown. For reference, the line P2=P/2 is also shown. Th is is the maximum conceivable size the second-ranking city could have; it would be reached when a country consists of just two cities with equal populations. Rein Taagepera & Edgar Kaskla162 The City-Country Rule 163 points are visibly most crowded in the zone between the n=0.9 and n=l.l curves, and n=1.0 is close to the best fi t curve for these data. Countries with very large populations (India and China) are on the low side, as expected. Th us our simple model does agree, indeed, with the pattern observed. A similar test was carried out for the third-, fourth- and fi fth-largest cities. Th e patterns of the corresponding graphs (not shown here) are quite similar to that in Figure 2 in that most country points fall between n=0.9 and n=1.1. As shown at the bottom of Table 1, more than 50% of the data points for the second-largest cities are located within the zone between n=0.9 and n=l.l. For the next-largest cities, this proportion gradually increases to 64%. Th e main body of Table 1 lists the median values of power constants (n) at various ranks (from R=1 to R=5). Th is is done separately in various country population brackets. Th e median n for all cities in all countries is 0.99. Th e fi rst- ranking cities will be discussed later. For all lower-ranking cities the values of n are within plus or minus 0.07 of n=1.00, as long as the total populations of countries remain below 100 million. For countries larger than 100 million, the values of n are consistently below 1.0, with a mean of 0.94, and for the world as a whole the values of n drop to around 0.85. Th e median n for all second-ranking cities is 0.98, and for the next-ranking cities it slowly but consistently decreases to 0.96 for the fi fth-ranking cities. Th us, the average relationship between the populations of a country and of its second- to fi fth-largest cities is well predicted by our model, despite its very simple starting point. Except for very large countries, the value of the parameter n that yields the best overall fi t (.99) is extremely close to 1.00. We may well wonder whether the simple relationship with n=l is some theo- retical norm for city systems which are genuinely separate interacting systems rather than parts of a larger system or composites of several systems. In this light, the lower n for very large countries could mean that they consist of several separate geographical systems. If so, then n should be especially low if the entire world were tentatively considered a single system. Th is is, indeed, the case (see Table 1): for the world, n is around 0.85. Going in the reverse direction, the tini- est countries still do not have n appreciably larger than 1.0, suggesting that, on the average, they are separate systems rather than parts of larger regional sys- tems. Figure 2 – Population of the second-largest city as a function of the total population of the country. Country population (million) n(P1) n(P2) n(P3) n(P4) n(P5) .01 to 1 1.26 .96 — — — .1 to 1 1.19 1.07 1.01 1.00 1.00 1 to 10 1.08 .98 .94 .96 .98 10 to 100 1.01 .97 .99 1.00 .98 Over 100 .98 .96 .93 .92 .94 Median of All Countries 1.08 .98 .97 .97 .96 The World .81 .84 .84 .86 .87 Percentage of Countries where 0.9