Ratio Mathematica Volume 38, 2020, pp. 261-285 Geometrical foundations of the sampling design with fixed sample size Pierpaolo Angelini ∗ Abstract We study the sampling design with fixed sample size from a geomet- ric point of view. The first-order and second-order inclusion proba- bilities are chosen by the statistician. They are subjective probabili- ties. It is possible to study them inside of linear spaces provided with a quadratic and linear metric. We define particular random quanti- ties whose logically possible values are all logically possible sam- ples of a given size. In particular, we define random quantities which are complementary to the Horvitz-Thompson estimator. We identify a quadratic and linear metric with regard to two univariate random quantities representing deviations. We use the α-criterion of concor- dance introduced by Gini in order to identify it. We innovatively ap- ply to probability this statistical criterion. Keywords: tensor product; linear map; bilinear map; quadratic and linear metric; α-product; α-norm 2010 AMS subject classifications: 62D05. 1 ∗Dipartimento di Scienze Statistiche, Università La Sapienza, Roma, Italia; pier.angelini@uniroma1.it 1Received on May 12th, 2020. Accepted on June 3rd, 2020. Published on June 30th, 2020. doi: 10.23755/rm.v38i0.511. ISSN: 1592-7415. eISSN: 2282-8214. c©P. Angelini This paper is published under the CC-BY licence agreement. 261 Pierpaolo Angelini 1 Introduction Given a finite population having N elements, we are only interested in con- sidering samples containing units of this population where no element of the pop- ulation under consideration can be selected more than once in the same sample (Basu [1971]). We are not interested in considering ordered samples of a given size selected from a finite population (Basu [1958]). On the other hand, when we consider not ordered samples where repetitions are not allowed we have no loss of information about a given parameter of the population under consideration (Conti and Marella [2015]). All logically possible samples of a given size belong to a given set. We suppose that we are always able to number them. It is known that if the number of all logically possible samples of a given set is very large then it could be a very hard or impossible work to give to them a number (Godambe and Joshi [1965]). A sampling design is characterized by a pair of elements (Joshi [1971]). The first element of this pair represents the set of all logically possi- ble samples selected from a finite population. The second element of this pair represents all probabilities assigned to the samples of the set of all logically pos- sible samples of a given size. We consider a distribution of probability in this way (Hartley and Rao [1962]). Each element of the set of all logically possible samples of a given size can be viewed as a logically possible event of a finite partition of incompatible and exhaustive elementary events. It is then possible to assign a sub- jective probability to each logically possible event of this partition (Good [1962]). A probability subjectively assigned to each logically possible event of a finite par- tition of events must only be coherent. It is inadmissible when it is not coherent. A probability is subjectively assigned to each logically possible event of a finite partition of events even when it is an equal probability assigned to each of them. An equal probability assigned to each logically possible event of a finite partition of events is always a subjective judgment. We have to note a very important point: when we say that it is possible to assign a subjective and coherent probability to every logically possible event of a given set of events we mean that the choice of any value in the interval from 0 to 1 is allowed. Such an interval includes both end- points. It would therefore be possible to assign to every logically possible event of a given set of events a probability equal to 0. This choice is absolutely coherent. We will however introduce a restriction that is concerned with this point. We have to note another very important point: we methodologically distinguish what it is logically possible from what it is subjectively probable. What it is logically possi- ble at a given instant it is not either certainly true or certainly false. One and only one element of the elements belonging to the set containing all logically possible elements at a given instant will be true a posteriori. A subjective probability is then assigned to each element of the set containing all logically possible elements before knowing this thing. 262 The α-criterion of concordance applied to probability 2 Events as points in the space of random quantities We consider a finite set of vectors denoted by S into the field R of real num- bers. We enumerate them. We consequently write s1, . . . ,sN, (1) where it turns out to be si ∈ S, i = 1, . . . ,N. We consider a linear space over R of all linear combinations of elements of S expressed in the form c1s1 + . . . + cNsN, (2) where every ci, i = 1, . . . ,N, is a real number. We observe that (2) is completely determined by the real numbers c1, . . . ,cN . Each number ci is associated with the element si of the set S. It is known that an association is exactly a function. For each si ∈S and c ∈ R we then consider csi (3) to be the function that associates c with si and 0 with sj, where we have j 6= i. Given a ∈ R, we obtain a(csi) = (ac)si. (4) Given c′ ∈ R, we obtain (c + c′)si = csi + c ′si. (5) Thus, it is possible to consider a linear space over R. It is the set of all functions of S into R. These functions can be written in the form given by (2). The functions 1s1, . . . , 1sN (6) are linearly independent so they represent a basis of the linear space under con- sideration. We have then to suppose that c1, . . . ,cN are elements of R such that it is possible to obtain the zero function given by c1s1 + . . . + cNsN = 0. (7) This means that we have ci = 0 for every ci, i = 1, . . . ,N. This thing con- sequently proves the linear independence under consideration. Moreover, it is always possible to write si instead of 1si. A sample belonging to the set of all logically possible samples of a given size is then expressed by the vector δ(s′) =   δ(1; s′) δ(2; s′) ... δ(N; s′)   (8) 263 Pierpaolo Angelini having N components, where s′ is a sample of the set of all logically possible sam- ples denoted by S′ (Godambe [1955]). We will always consider vectors viewed as ordered lists of real numbers within this context. A sample can be expressed by the real numbers of a linear combination of N-dimensional vectors by means of which another N-dimensional vector is obtained. If a sample is identified with an N-dimensional vector then its components express the real numbers of a lin- ear combination of the elements of a basis of the linear space under considera- tion. This linear space is denoted by RN . Its basis is denoted by S = {ej}, j = 1, . . . ,N. We always consider orthonormal bases within this context. We therefore write δ(1; s′)e1 + δ(2; s ′)e2 + . . . + δ(N; s ′)eN = y, (9) where we have y ∈ RN . We consider as many linear combinations of the elements of S = {ej}, j = 1, . . . ,N, as logically possible samples there are into the set of all logically possible samples of a given size denoted by S′. We note that the real numbers of every linear combination of the elements of S = {ej}, j = 1, . . . ,N, represent one of the logically possible samples of S′. We have evidently δ(i; s′) = { 1 if i ∈ s′ 0 if i /∈ s′ (10) for every i = 1, . . . ,N, where the elements of the population under consideration are overall N. We consider all logically possible samples of S′ having the same size denoted by n. Since the population has got N elements we observe that the number of n-combinations is equal to the binomial coefficient denoted by ( N n ) . We observe that S′ whose elements are elementary events is a subset of RN . We say that S′ is embedded in RN . 3 Finite partitions of logically possible elementary events Given N, all logically possible samples whose size is equal to n belong to the set denoted by S′. We have n = N∑ i=1 δ(i; s′) (11) for every s′ ∈S′. Every sample of the set of all logically possible samples corre- sponds to a vertex denoted by δ(s′) of an N-dimensional unit hypercube denoted by [0, 1]N . All logically possible samples of S′ can be viewed as possible events 264 The α-criterion of concordance applied to probability of a finite partition of incompatible and exhaustive elementary events (de Finetti [1982b]). We are consequently able to define a univariate random quantity whose logically possible values are represented by all logically possible samples of S′. The logically possible values of it are not real numbers but they are N-dimensional vectors of an N-dimensional linear space over R. Every logically possible sample belonging to S′ has a subjective probability of being selected (de Finetti [1989]. It represents the degree of belief in the selection of a logically possible sample assigned by a given individual (the statistician) at a certain instant with a given set of information. An evaluation of probability known over a set of possible and el- ementary events coinciding with all logically possible samples of S′ is admissible when it is coherent. This means that it must be∑ s′∈S′ p(s′) = 1. (12) It is essential to note a very important point: we have to introduce an unusual restriction with regard to the coherence because we exclude of choosing a subjec- tive probability equal to 0 with respect to any possible and elementary event. This implies that any logically possible sample of S′ has always a probability greater than zero of being selected. We have consequently 0 < p(s′) ≤ 1 (13) for every s′ ∈ S′ (Coletti et al. [2015]). Thus, conditions of coherence coincide with positivity of each probability of a random event and finite additivity of prob- abilities of incompatible and exhaustive events (Gilio and Sanfilippo [2014]). We will also consider bivariate random quantities whose components are two univari- ate random quantities (de Finetti [2011]). If the logically possible values of these univariate random quantities are the same vectors of the same N-dimensional lin- ear space over R then these random quantities have the same marginal distribu- tions of probability. They represent the same finite partition of incompatible and exhaustive elementary events. Putting them into a two-way table we observe that it is always a table having the same number of rows and columns. 4 First-order inclusion probabilities viewed as a co- herent prevision of a univariate random quantity We consider a univariate random quantity denoted by S whose logically possi- ble values are vectors of RN . Given N and n, the number of the logically possible values of S coincides with the binomial coefficient expressed by( N n ) = k. (14) 265 Pierpaolo Angelini The set of the logically possible values of S is then given by I(S) = {s′1, . . . ,s′k}, with s′i ∈S′, i = 1, . . . ,k. A nonzero probability is assigned to each sample of the set of all logically possible samples. Let p(s′1), . . . ,p(s ′ k) be these probabilities. It must therefore be k∑ i=1 p(s′i) = 1, (15) with 0 < p(s′i) ≤ 1 (16) for every i = 1, . . . ,k. It is possible to obtain an N-dimensional vector after assigning a nonzero probability to each sample of S′. We denote it with π. It represents the first-order inclusion probabilities of all units of the population under consideration. Thus, we write π =   π1 π2 ... πN   = p(s′1)   δ(1; s′1) δ(2; s′1) ... δ(N; s′1)   + . . . + p(s′k)   δ(1; s′k) δ(2; s′k) ... δ(N; s′k)   , (17) where we have πi > 0 for every i = 1, . . . ,N. We have evidently written a convex combination of the vertices of the N-dimensional unit hypercube [0, 1]N corresponding to the samples of S′. Each vertex is a sample having a nonzero weight representing a subjective probability. It is essential to note that π is a coherent prevision of S denoted by P(S). We therefore write π =   π1 π2 ... πN   = P(S) = k∑ i=1 δ(s′i)p(s ′ i). (18) We observe that the logically possible values of S are represented by vectors hav- ing N components so its coherent prevision must also be represented by a vector having N components. The logically possible values of S belong to the set de- noted by I(S). Each element of I(S) contains first-order inclusion “a posteriori” probabilities. This implies that π must contain first-order inclusion “a priori” probabilities based on the degree of belief in the selection of all logically possible samples attributed by the statistician at a certain instant with a given set of infor- mation. An “a posteriori” probability of a unit of the population of being included in a given sample is always predetermined. If a unit of the population is contained “a posteriori” in the sample that has been selected then its probability is equal to 1. If a unit of the population does not belong “a posteriori” to the sample that 266 The α-criterion of concordance applied to probability has been selected then its probability is equal to 0. A convex combination coin- ciding with P(S) has conveniently been taken under consideration because the logically possible values of S are incompatible and exhaustive elementary events of a finite partition of random events. In general, if we consider an event divided into two or more than two incompatible events then we obtain that its coherent probability is the sum of two or more than two coherent probabilities. This sum is a linear combination of probabilities (de Finetti [1980]). We evidently consider a convex combination coinciding with P(S) within this context, where its weights or coefficients are “a priori” subjective probabilities connected with the samples of S′ (de Finetti [1981]). This convex combination is characterized by k column vectors viewed as k matrices. Each row of every N × 1 matrix is a first-order inclusion “a posteriori” probability. We therefore consider a linear combination of probabilities (de Finetti [1982a]). 5 First-order inclusion probabilities obtained by means of linear maps We consider all logically possible samples belonging to the set S′. Given N and n, let k be the number of all elements of S′. We are consequently able to determine an N × k matrix in R. We denote it by B. It is therefore possible to define a linear map expressed by LB : Rk → RN. (19) It depends on B. Moreover, it also depends on the choice of bases for Rk and RN . We choose standard bases for Rk and RN . We consider all probabilities assigned to the logically possible samples of S′ whose size is equal to n. They can be viewed as a column vector. We denote it by Q. We have then Q =   p(s′1) p(s′2) ... p(s′k)   . (20) It therefore turns out to be LB(Q) = BQ = π =   π1 π2 ... πN   . (21) 267 Pierpaolo Angelini We note that if k = N then we are able to define a linear map expressed by LB : RN → RN. (22) We observe that B is a square matrix. This linear map is an endomorphism. It is also an isomorphism. It is then an automorphism, so we write B−1π =   p(s′1) p(s′2) ... p(s′k)   . (23) Given B, each row of Q can subjectively vary because an evaluation of probability known over a set of logically possible events must only be coherent. This means that the sum of all probabilities of the samples of S′ must be equal to 1. We consequently observe that there are infinite ways of choosing all probabilities of the samples of S′. They are conveniently caught by LB. It is hence possible to obtain π as a multiplication of matrices according to a linear map depending on B and the standard bases of the linear spaces under consideration. Also, we always obtain N∑ i=1 πi = n. (24) 6 First-order and second-order inclusion probabili- ties obtained by means of tensor products We consider a bivariate random quantity denoted by S12 whose components are two univariate random quantities denoted by 1S and 2S. We therefore write S12 = {1S, 2S}. Given N and n, the logically possible values of each univari- ate random quantity coincide with k samples belonging to the set S′. They are all logically possible samples of S′ whose size is equal to n. Each sample of S′ is a vector of RN . We have to note a very important point: we suppose that the logically possible values of 1S and 2S are the same N-dimensional vectors of the same N-dimensional linear space over R. These univariate random quanti- ties have then the same marginal distributions of probability. Putting them into a two-way table we observe that it is always a square table. We observe that all probabilities of the joint distribution of probability outside of the main diagonal of this table are always equal to 0. The nonzero probabilities of the joint distribu- tion of probability coincide with p(s′1), . . . ,p(s ′ k). They are on the main diagonal of the table under consideration. A coherent prevision of S12 denoted by P(S12) 268 The α-criterion of concordance applied to probability is obtained by means of the sum of k square matrices. The number of rows and columns of every square matrix of this sum is equal to N. Each square matrix of this sum derives from a tensor product belonging to the same linear space denoted by RN ⊗RN . It is an N2-dimensional linear space over R. We always consider as many tensor products as joint probabilities are associated with the samples of S′. We have then p(s′i)     δ(1; s′i) δ(2; s′i) ... δ(N; s′i)   ,   δ(1; s′i) δ(2; s′i) ... δ(N; s′i)     7→ p(s′i)     δ(1; s′i) δ(2; s′i) ... δ(N; s′i)  ⊗   δ(1; s′i) δ(2; s′i) ... δ(N; s′i)     (25) for every i = 1, . . . ,k. We note that it turns out to be p(s′i)     δ(1; s′i) δ(2; s′i) ... δ(N; s′i)  ⊗   δ(1; s′i) δ(2; s′i) ... δ(N; s′i)     = p(s′i)   δ(1; s′i) δ(2; s′i) ... δ(N; s′i)  [δ(1; s′i) δ(2; s′i) . . . δ(N; s′i)] . (26) If we consider a coherent prevision of S12 then we deal with a bilinear map expressed by RN × RN → MN, N (R), where the linear space over R of the N × N matrices in R is denoted by MN, N (R). This linear space is isomor- phic to RN2 . The matrix product resulting from this bilinear map is factorized by means of the tensor product of vectors of RN . It is also factorized by means of a unique linear map whose domain coincides with RN ⊗ RN . This is because we are able to know a basis of RN ⊗ RN as well as the value of the linear map under consideration on basis elements. We suppose that a basis of RN ⊗ RN results from the standard basis of RN , where RN is evidently considered two times. It is therefore possible to say that there exists a unique linear map given by RN ⊗ RN → MN, N (R). It coincides with the product of a joint probability viewed as a scalar and a square matrix. We consider k products of a joint proba- bility and a square matrix. We obtain k square matrices in this way. We consider the sum of these k square matrices in order to obtain a coherent prevision of S12. We observe that RN × RN → MN, N (R) and RN ⊗ RN → MN, N (R) have the same codomain. A factorization of RN × RN → MN, N (R) is then realized by means of a bilinear map given by RN × RN → RN ⊗ RN and a linear map given by RN ⊗RN →MN, N (R). These two maps are connected, so we obtain a composition of functions identified with RN × RN →MN, N (R). The following 269 Pierpaolo Angelini commutative diagram RN ×RN RN ⊗RN MN, N (R) permits of visualizing what we have said. A coherent prevision of S12 is then bilinear and homogeneous. It is given by P(S12) = Π =   π1 π12 . . . π1N π21 π2 . . . π2N . . . . . . . . . . . . πN1 πN2 . . . πN   =   π1 π12 . . . π1N π12 π2 . . . π2N . . . . . . . . . . . . π1N π2N . . . πN   . (27) It coincides with the symmetric matrix of the first-order and second-order in- clusion probabilities. The trace of this matrix is evidently equal to n (Angelini [2020]). 7 The covariance of two univariate random quanti- ties obtained by considering two bilinear maps Given S12 = {1S, 2S}, the covariance of 1S and 2S is expressed by C(1S, 2S) = P(S12) −P(1S)P(2S), (28) where P(S12) represents the prevision or mathematical expectation or expected value of S12, while P(1S) and P(2S) represent the prevision or mathematical expectation or expected value of 1S and 2S. We note that a coherent prevision of S12 derives from a bilinear map because we have P(S12) =   π1 π12 . . . π1N π21 π2 . . . π2N . . . . . . . . . . . . πN1 πN2 . . . πN   . (29) Moreover, since we have P(1S) =   π1 π2 ... πN   (30) 270 The α-criterion of concordance applied to probability as well as P(2S) =   π1 π2 ... πN   , (31) we note that the product of these two linear maps is evidently bilinear. Such a product is expressed in the form  π1 π2 ... πN  [π1 π2 . . . πN] =   π1π1 π1π2 . . . π1πN π2π1 π2π2 . . . π2πN . . . . . . . . . . . . πNπ1 πNπ2 . . . πNπN   . (32) It is then evident that the covariance of 1S and 2S derives from two bilinear maps because we can write C(1S, 2S) =   π1 π12 . . . π1N π21 π2 . . . π2N . . . . . . . . . . . . πN1 πN2 . . . πN  −   π1π1 π1π2 . . . π1πN π2π1 π2π2 . . . π2πN . . . . . . . . . . . . πNπ1 πNπ2 . . . πNπN   . (33) By writing C(1S, 2S) =   (π1 −π1π1) (π12 −π1π2) . . . (π1N −π1πN ) (π21 −π2π1) (π2 −π2π2) . . . (π2N −π2πN ) . . . . . . . . . . . . (πN1 −πNπ1) (πN2 −πNπ2) . . . (πN −πNπN )   (34) we note that it is possible to consider as many random components as inclusion probabilities are studied. A unit of the population under consideration can be in- cluded, or not, in a given sample (Bondesson [2010]). This thing is uncertain until a given sample is selected (Hájek [1958]). Two different units of the population under consideration can be included, or not, in the same sample (Deville and Tillé [1998]). This thing is uncertain until a given sample is selected. A component associated with one or two different units of the population under consideration is evidently random for this reason (Connor [1966]). This means that each random component is characterized by a subjective probability. It is an “a priori” prob- ability. It is also characterized by an “a posteriori” probability coinciding with one of the two logically possible values of a random event, 0 and 1. One and only one of these two logically possible values of a random event will be true “a posteriori”. On the other hand, it is known that the notion of probability basically 271 Pierpaolo Angelini deals with an aspect that is included between two extreme aspects. The first ex- treme aspect deals with situations of non-knowledge or ignorance or uncertainty determining the set of all logically possible samples of a given size viewed as el- ementary events. They are evidently all logically possible alternatives that can be considered. The second extreme aspect deals with definitive certainty expressed in the form of what it is certainly true or certainly false. Thus, every logically possible sample of a given size definitively becomes true or false. Probability is subjectively distributed by the statistician as a mass over the domain of all logi- cally possible samples of a given size before knowing which is the true sample to be selected “a posteriori”. Having said that, the variance of every random compo- nent as well as the covariance of two random components are dealt with by means of the first-order and second-order inclusion probabilities. The variance of each random component is represented by every element on the main diagonal of the symmetric matrix given by (34). The covariance of two random components is represented by every element outside of the main diagonal of the square matrix given by (34). 8 A univariate random quantity representing devi- ations We define a univariate random quantity representing deviations. We denote it by D. We firstly consider S whose values are all logically possible samples of a given size viewed as elementary events belonging to the set S′. Given N and n, the number of the logically possible values of S is equal to ( N n ) = k. The set of the logically possible values of S is then given by I(S) = {s′1, . . . ,s′k}, with s′i ∈ S′, i = 1, . . . ,k. A nonzero probability denoted by p(s′i), i = 1, . . . ,k, is assigned to each sample of S′. We therefore obtain an N-dimensional vector denoted by π. It represents the first-order inclusion probabilities of all units of the population under consideration. They are all greater than zero. This vector is always independent of the origin of the coordinate system that we could consider. We note that the number of the logically possible values of D is equal to k. It is the same of the one of S. The set of the logically possible values of D is given by I(D) = {d′1, . . . ,d′k}, with d′i =     δ(1; s′i) δ(2; s′i) ... δ(N; s′i)  −   π1 π2 ... πN     , (35) 272 The α-criterion of concordance applied to probability where we have i = 1, . . . ,k. It follows that we have p(s′1)d ′ 1 + . . . + p(s ′ k)d ′ k =   0 0 ... 0   . (36) This means that P(S) is an N-dimensional vector such that all deviations from it that are multiplied by the corresponding probabilities represent N-dimensional vectors whose sum coincides with the zero vector of RN . We are now able to calculate the variance of S by using D. We refer to the α-criterion of concordance introduced by Gini. It is a statistical criterion that we innovatively apply to prob- ability viewed as a mass. An absolute maximum of concordance is then realized when each d′i, i = 1, . . . ,k, is multiplied by itself. If each d ′ i, i = 1, . . . ,k, is multiplied by itself then we obtain k square matrices. Every multiplication that we consider is a tensor product of two vectors of RN . These two vectors represent two deviations which are the same. The components of these two vectors are then the same. Hence, the variance of S coincides with the sum of k traces of k square matrices. Each trace of the square matrix under consideration is an inner product viewed as an α-product. An α-product is a bilinear form. We consider each p(s′i), i = 1, . . . ,k, as a scalar. Each p(s′i), i = 1, . . . ,k, is firstly a subjective proba- bility. Thus, it always characterizes a random quantity. It is nevertheless viewed as a scalar within this context. We can therefore multiply all components of d′i by p(s′i), i = 1, . . . ,k. We note that the components of each d ′ i, i = 1, . . . ,k, are always independent of the origin of the coordinate system that we could consider. We therefore write σ2S = tr ( d′1 T (p(s′1)d ′ 1) ) + . . . + tr ( d′k T (p(s′k)d ′ k) ) . (37) We have evidently introduced a quadratic and linear metric in this way. We there- fore note that σ2S is the sum of the squares of k α-norms. It is possible to verify that every trace of a square matrix is an α-product which is an α-commutative product, an α-associative product, an α-distributive product and an α-orthogonal product. We have to note a very important point: S and D are two different quan- tities from a geometric point of view because they are represented by different sets of N-dimensional vectors. They are nevertheless the same quantity from a randomness point of view. They are characterized by the same probabilities. We therefore observe the same events because we consider only a change of origin. 273 Pierpaolo Angelini 9 Intrinsic properties of a univariate random quan- tity representing deviations Translations and rotations of vectors identifying a given univariate random quantity representing deviations are intrinsic properties of it. They do not depend on the choice of a basis of a given linear space. We say that all vectors of S′ are subjected to the same translation when we consider k sums of two vectors. We consider k sums of two vectors because the number of the elements of S′ is equal to k. The first vector of each sum of them is given by s′i, i = 1, . . . ,k. The second vector of each sum of them is given by an arbitrary N-dimensional vector which is always the same. We say that all vectors of S′ are then subjected to the same change of origin. It follows that σ2S is invariant with respect to a translation of all vectors of S′. We say that a quadratic and linear metric is invariant with respect to a translation of all vectors of S′. Concerning a rotation, let A = (ai′j ) be an N×N orthogonal matrix. Each element of this matrix is denoted by two indices. We use contravariant and covariant indices without loss of generality. The contravariant indices represent the rows of the matrix. We have i′ = 1, . . . ,N. The covariant indices represent the columns of the matrix. We have j = 1, . . . ,N. We observe that rotations of all vectors contained in I(D) = {d′1, . . . ,d′k} are characterized by A. We write RA(d′i) : d ′ i ⇒ Ad ′ i = (d ′ i) ∗, (38) where we have i = 1, . . . ,k. We evidently denote by (d′i) ∗ the result of the rotation of the vector d′i obtained by means of the orthogonal matrix denoted by A. The vector (d′i) ∗ is an N-dimensional vector. Its components are originated by N linear and homogeneous relationships. We have to note a very important point: P(S) is an N-dimensional vector such that all rotated deviations from it that are multiplied by the corresponding probabilities represent N-dimensional vectors whose sum coincides with the zero vector of RN . We have then p(s′1)(d ′ 1) ∗ + . . . + p(s′k)(d ′ k) ∗ =   0 0 ... 0   . (39) If we consider rotated deviations then we write σ2S∗ = tr ( (d′1) ∗T (p(s′1)(d ′ 1) ∗) ) + . . . + tr ( (d′k) ∗T (p(s′k)(d ′ k) ∗) ) , (40) where S∗ represents a univariate random quantity connected with rotated devia- tions. Since it turns out to be σ2S = σ 2 S∗, (41) 274 The α-criterion of concordance applied to probability we say that the variance of S is invariant with respect to all rotated deviations obtained by means of the same orthogonal matrix denoted by A. We have there- fore introduced a quadratic and linear metric which is invariant with respect to translations and rotations of vectors identifying a univariate random quantity rep- resenting deviations. 10 A univariate random quantity representing vari- ations and its intrinsic properties We define a univariate random quantity representing variations. We denote it by V . Given D, the set of the logically possible values of V is expressed by I(V ) = {v′1, . . . ,v′k}, with v′i = d ′ i 1√ σ2S , (42) where we have i = 1, . . . ,k. We therefore note that S, D and V are different quantities from a geometric point of view. They are conversely the same quantity from a randomness point of view. It is possible to verify that it turns out to be σ2V = 1. (43) This index is always equal to 1 independently of the components of d′i, i = 1, . . . ,k. It is evident that these components identify σ2S, so we say that σ 2 V = 1 is also independent of σ2S. We observe that rotations of all vectors belonging to I(V ) = {v′1, . . . ,v′k} are always characterized by an N × N orthogonal matrix. We write (v′i) ∗ = (d′i) ∗ 1√ σ2S , (44) where we have i = 1, . . . ,k. If we consider translations and rotations of vectors identifying a univariate random quantity representing variations then we observe intrinsic properties that we have already considered. We note that V can be sub- jected to an affine transformation. If V is subjected to an affine transformation then we write V ⇒ aV + b, (45) where we have a 6= 0. We therefore observe that each vector of I(aV + b) is equal to the corresponding vector of I(V ). This means that the components of each vector of I(aV + b) are the same of the ones of the corresponding vector of I(V ). Hence, we say that univariate random quantities representing variations are invariant with respect to an affine transformation. Given S12 = {1S, 2S}, we note 275 Pierpaolo Angelini that we have 1V = 2V = V if and only if it turns out to be 1S = 2S = S. It is pos- sible to verify that the covariance of 1V and 2V is an α-product. It is always equal to 1. On the other hand, it coincides with the Bravais-Pearson correlation coeffi- cient in the case of a perfect direct linear relationship between two quantities. It is possible to verify that the Bravais-Pearson correlation coefficient is invariant with respect to rotations of vectors belonging to I(V ). It is therefore invariant with respect to an affine transformation of V . We have to note a very important point: intrinsic properties that we have considered can be related to the random quantities themselves or to specific metric indices based on these quantities. Specific metric indices are evidently based on random quantities representing deviations or varia- tions because we calculate them after taking such random quantities into account. We have to note another very important point: we are not interested in translating or rotating a geometric object in real terms but we are interested in studying its intrinsic properties because these properties are a fundamental consequence of its geometric representation. 11 Metric aspects of an estimate of the population mean We want to wonder what happens from a metric point of view when we study one or more than one attribute with respect to each element of the population under consideration. We suppose of observing three different and independent characteristics of each element of the population under consideration. We ad- mit this thing without loss of generality. We therefore consider three different and independent variables denoted by X, Y and Z. We note that X is the variable con- cerning the first attribute of each element of the population under consideration. The variable concerning the second attribute of each element of the population under consideration is denoted by Y . The variable concerning the third attribute of each element of the population under consideration is denoted by Z. If we study only one attribute of each element of the population under consideration then we estimate the population mean by using the univariate Horvitz-Thompson estimator. It is defined by t (x) HT = 1 N N∑ i=1 1 πi δ(i; s′)xi, (46) where we have s′ ∈ S′. It is linear and homogeneous (Horvitz and Thompson [1952]). We note that s′ is one of the logically possible samples of S′. Also, the weight of the generic unit i of the population under consideration never depends on s′. It is obtained beginning from (17). We have conversely considered all 276 The α-criterion of concordance applied to probability logically possible samples of S′ when we have defined S, D and V . We did not consider only one of them. These random quantities are complementary to the univariate Horvitz-Thompson estimator for this reason. Also, we have always taken P(S) = π into account when we have defined S, D and V . On the other hand, a coherent prevision of S is itself linear and homogeneous. The expected value of the univariate Horvitz-Thompson estimator is given by E[t (x) HT ] = µx. (47) It is equal to the population mean denoted by µx for any vector (x1 x2 . . . xN )T ∈ RN . We have µx = 1 N N∑ i=1 xi. (48) The variance of the univariate Horvitz-Thompson estimator is given by V(t (x) HT ) = 1 N2 N∑ i=1 N∑ j=1 xi πi xj πj ∆ij, (49) where we have ∆ij = πij −πiπj, with i,j = 1, . . . ,N. We note that ∆ij, i,j = 1, . . . ,N, is obtained by means of (34). Since we consider all logically possible samples whose size is equal to n we can also write V(t (x) HT ) = − 1 2N2 N∑ i=1 N∑ j=1 ( xi πi − xj πj )2 ∆ij, (50) where we have again ∆ij = πij − πiπj, with i,j = 1, . . . ,N (Yates and Grundy [1953]). This variance is estimated by the univariate Yates-Grundy estimator given by V̂Y G(t (x) HT ) = 1 2N2 ∑ i∈s′ ∑ j∈s′ ( xi πi − xj πj )2 πiπj −πij πij , (51) where we have πij > 0 because we assume that the sampling design is measurable and πij ≤ πiπj, with i,j = 1, . . . ,N. The same thing goes when we consider Y and Z. We have to note a very important point: the variance of S denoted by σ2S coincides with the variance of the univariate Horvitz-Thompson estimator given by (50) when the absolute values of each deviation of xi from xj, with i 6= j = 1, . . . ,N, are multiples of N. In addition to this thing, the variance of S coincides with the variance of the univariate Horvitz-Thompson estimator given by (50) when the entropy H of the sampling design with fixed sample size is maximum (Tillé and Wilhelm [2017]), where we have H = − ∑ s′∈S′ p(s′) log p(s′). (52) 277 Pierpaolo Angelini We note that H is maximum when we have p(s′1) = p(s ′ 2) = . . . = p(s ′ k), (53) with ∑k i=1 p(s ′ i) = 1. It does not turn out to be p(s ′) = 0 within this context. However, if we observe p(s′) = 0 with regard to (52) then it turns out to be [0 log 0] = 0 by convention. We therefore say that the weights of the univari- ate Horvitz-Thompson estimator are based on a coherent prevision of S. We have obtained a linear and quadratic metric by considering two univariate random quan- tities representing deviations. We have obtained the variance of S by using this metric. The same thing goes when we consider Y and Z. We have to note an- other very important point: by studying three different and independent attributes of each element of the population under consideration we do not jointly consider three variables but we jointly consider two variables at a time. This is because it is not appropriate to use a trilinear form when we deal with metric relationships. If we jointly study two attributes of each element of the population under con- sideration then we estimate the bivariate population mean by using the bivariate Horvitz-Thompson estimator. We write t (xy) HT = 1 N2 N∑ i=1 N∑ j=1 1 πi δ(i; s′)xi 1 πj δ(j; s′)yj (54) when we jointly consider X and Y , where all first-order inclusion probabilities are greater than zero. They are obtained by means of (17). We write t (xz) HT = 1 N2 N∑ i=1 N∑ j=1 1 πi δ(i; s′)xi 1 πj δ(j; s′)zj (55) when we jointly consider X and Z, where all first-order inclusion probabilities are greater than zero. They are obtained by means of (17). We write t (yz) HT = 1 N2 N∑ i=1 N∑ j=1 1 πi δ(i; s′)yi 1 πj δ(j; s′)zj (56) when we jointly consider Y and Z, where all first-order inclusion probabilities are greater than zero. They are obtained by means of (17). The bivariate Horvitz- Thompson estimator is obtained by multiplying two linear and homogeneous ex- pressions. This means that what we have said concerning the weights of the uni- variate Horvitz-Thompson estimator does not change. The expected value of the bivariate Horvitz-Thompson estimator concerning X and Y is given by E[t (xy) HT ] = 1 N2 N∑ i=1 N∑ j=1 1 πi E[δ(i; s′)]xi 1 πj E[δ(j; s′)]yj. (57) 278 The α-criterion of concordance applied to probability We observe that it turns out to be E[δ(i; s′)] = πi as well as E[δ(j; s′)] = πj for every s′ ∈ S′, i,j = 1, . . . ,N. It is therefore evident that (57) is equal to the population mean denoted by µ(xy) for any vector (x1 x2 . . . xN )T ∈ RN and (y1 y2 . . . yN ) T ∈ RN , where we have µ(xy) = 1 N2 N∑ i=1 N∑ j=1 xi yj. (58) The same thing goes when we consider the expected value of the bivariate Horvitz- Thompson estimator concerning X and Z as well as the expected value of the bivariate Horvitz-Thompson estimator concerning Y and Z. We consider an aux- iliary variable denoted by X′ related to X when the values of X given by xi, i = 1, . . . ,N, are unknown. We consider an auxiliary variable denoted by Y ′ related to Y when the values of Y given by yi, i = 1, . . . ,N, are unknown. We consider an auxiliary variable denoted by Z′ related to Z when the values of Z given by zi, i = 1, . . . ,N, are unknown. The known values of X′ are given by x′i, i = 1, . . . ,N. We write µx′ = 1 N N∑ i=1 x′i. (59) If X and X′ are approximately proportional then it turns out to be xi x′i ≈ constant, (60) where we have i = 1, . . . ,N. The first-order inclusion probabilities chosen by the statistician are then given by πi = nx′i Nµx′ , (61) where we have i = 1, . . . ,N. We note that such probabilities are used into (23) in order to obtain p(s′i), i = 1, . . . ,k, when we have k = N. We observe that p(s′i), i = 1, . . . ,k, are used in order to obtain a coherent prevision of S. If we have k 6= N then we consider a system of N linear equations with k unknowns, where π1, . . . ,πN are constant terms. We evidently refer to (21). We therefore observe that π1, . . . ,πN represent a coherent prevision of S obtained beginning from p(s′i), i = 1, . . . ,k. We observe that α-products and α-norms use p(s ′ i), i = 1, . . . ,k, as scalars. Also the second-order inclusion probabilities character- ize our metric structure. They are obtained by means of tensor products having p(s′i), i = 1, . . . ,k, as scalars. They are chosen by the statistician because he sub- jectively chooses p(s′i), i = 1, . . . ,k. He is consequently able to observe πij > 0, i,j = 1, . . . ,N. We have established them in (27). The same thing goes when we consider Y ′ and Z′. 279 Pierpaolo Angelini 12 A metric homoscedasticity of different variables identifying different and independent attributes of the units of the population We have jointly to consider two variables at a time for a metric reason. When we jointly consider X and Y we have firstly to disaggregate t(xy)HT . Given t (x) HT = 1 N N∑ i=1 1 πi δ(i; s′)xi (62) and t (y) HT = 1 N N∑ j=1 1 πj δ(j; s′)yj, (63) the covariance of these two univariate Horvitz-Thompson estimators is therefore expressed by C(t (x) HT , t (y) HT ) = 1 N2 N∑ i=1 N∑ j=1 xi πi yj πj ∆ij, (64) where we have ∆ij = πij −πiπj, with i,j = 1, . . . ,N. We note that ∆ij, i,j = 1, . . . ,N, is obtained by means of (34). The same thing goes when we jointly consider X and Z as well as Y and Z. We note that C(t (x) HT , t (x) HT ) = V(t (x) HT ) = 1 N2 N∑ i=1 N∑ j=1 xi πi xj πj ∆ij, (65) where we have ∆ij = πij − πiπj, i,j = 1, . . . ,N. We observe that ∆ij, i,j = 1, . . . ,N, is obtained by means of (34). We note that C(t (y) HT , t (y) HT ) = V(t (y) HT ) = 1 N2 N∑ i=1 N∑ j=1 yi πi yj πj ∆ij, (66) where we have ∆ij = πij − πiπj, with i,j = 1, . . . ,N. We observe that ∆ij, i,j = 1, . . . ,N, is obtained by means of (34). It is also possible to write C(t (z) HT , t (z) HT ) = V(t (z) HT ) = 1 N2 N∑ i=1 N∑ j=1 zi πi zj πj ∆ij, (67) where we have ∆ij = πij − πiπj, with i,j = 1, . . . ,N. We observe that ∆ij, i,j = 1, . . . ,N, is obtained by means of (34). We are interested in knowing 280 The α-criterion of concordance applied to probability what happens from a metric point of view when we study three different and independent attributes with respect to each element of the population under con- sideration. We have defined S, D and V . In particular, we consider a bivariate random quantity representing deviations. It is expressed by D12 = {1D, 2D}. Its components are two univariate random quantities, 1D and 2D, identifying two sets of N-dimensional vectors. Each vector of a set of N-dimensional vectors is equal to the corresponding vector of the other set of N-dimensional vectors. We have consequently I(1D) = I(2D) = {d′1, . . . ,d′k}. Given p(s ′ i), i = 1, . . . ,k, we observe that 1D is equal to 2D, so the covariance of 1D and 2D is equal to the variance of S denoted by σ2S. We observe this thing regardless of any pair of variables that we consider. We could indifferently consider X and Y or X and Z or Y and Z. On the other hand, if we take 1V and 2V into account then we note that their covariance is equal to 1. Since it turns out to be 1V = 2V = V we say that the variance of V is equal to 1. We observe this thing regardless of any pair of variables that we consider. We could indifferently consider X and Y or X and Z or Y and Z. We therefore say that X, Y and Z are homoscedastic from a metric point of view. We say this thing after considering all logically possible samples having a given size belonging to S′. We say this thing after defining S with respect to X, Y , Z. We say this thing because, given p(s′i), i = 1, . . . ,k, the variance of S is always the same. It is obtained by virtue of the metric structure that we have introduced. 13 What is all this for? All the first-order inclusion probabilities derive from a coherent prevision of S. A coherent prevision of S always depends on p(s′i), i = 1, . . . ,k, where these probabilities are coherently chosen by the statistician. All the second-order inclu- sion probabilities derive from a coherent prevision of S12. A coherent prevision of S12 always depends on p(s′i), i = 1, . . . ,k. A coherent prevision of S is linear and homogeneous. A coherent prevision of S12 is bilinear and homogeneous. The bivariate Horvitz-Thompson estimator is obtained by multiplying two linear and homogeneous expressions. This means that what we are going to say concerning the weights of the univariate Horvitz-Thompson estimator continues to be valid even when we make reference to the bivariate Horvitz-Thompson estimator. We therefore make reference to the first-order inclusion probabilities. If there exists a direct linear relationship between X′ and X then the statistician chooses high inclusion probabilities denoted by πi with respect to the units of the population under consideration having high attributes of X′ denoted by x′i, i = 1, . . . ,N. This is because they are likely associated with high attributes of X denoted by xi, i = 1, . . . ,N. The same thing goes when we consider a direct linear relationship 281 Pierpaolo Angelini between Y ′ and Y as well as between Z′ and Z. If X and X′ are approximately proportional then the first-order inclusion probabilities chosen by the statistician are given by πi = nx′i∑N j=1 x ′ j , (68) where we have i = 1, . . . ,N. If it turns out to be πi > 1 for some unit of the population under consideration then we have πi = 1 for all units of the pop- ulation under consideration having i as a label and such that it turns out to be nx′i ≥ ∑N j=1 x ′ j because x ′ i is high. We consider n > 1 within this context. The statistician consequently chooses πi = (n−nA) x′i∑N j=1 j /∈A x′j , (69) where we have i = 1, . . . ,N, i /∈ A, concerning the remaining units of the pop- ulation under consideration. The set of the units of the population under consid- eration such that it turns out to be nx′i ≥ ∑N j=1 x ′ j is denoted by A, while their number is denoted by nA. The same thing goes when we consider Y ′ and Y as well as Z′ and Z. Having said that, we evidently establish a linear relationship between p(s′i), i = 1, . . . ,k, and πi, i = 1, . . . ,N. If the statistician chooses p(s ′ i), i = 1, . . . ,k, with ∑k i=1 p(s ′ i) = 1, then it is possible to get πi, i = 1, . . . ,N, with∑N i=1 πi = n. We write   π1 π2 ... πN   = k∑ i=1 δ(s′i)p(s ′ i). (70) He is consequently able to obtain πi > 0 for every i = 1, . . . ,N. Conversely, if the statistician chooses πi, i = 1, . . . ,N, then it is possible to get p(s′i), i = 1, . . . ,k. We observe that α-products and α-norms use p(s′i), i = 1, . . . ,k, as scalars. We obtain different metric relationships by using α-norms whose scalars are p(s′i), i = 1, . . . ,k. We note that π1, . . . ,πN are used into B−1P(S) =   p(s′1) p(s′2) ... p(s′k)   (71) in order to obtain p(s′i), i = 1, . . . ,k, when we have k = N. We note that B is a square matrix, while B−1 is its inverse. If we have k 6= N then we consider 282 The α-criterion of concordance applied to probability a system of N linear equations with k unknowns, where π1, . . . ,πN are constant terms. We evidently refer to LB(Q) = B   p(s′1) p(s′2) ... p(s′k)   =   π1 π2 ... πN   = P(S). (72) It is known that if the statistician chooses appropriate inclusion probabilities then he is able to obtain a more efficient estimator of the population mean. 14 Conclusions We have considered random quantities whose logically possible values are all logically possible samples of a given size belonging to a given set. Every logically possible sample belonging to a given set has a subjective probability of being selected. We have obtained the first-order inclusion probabilities by means of coherent previsions of univariate random quantities. We have defined bivariate random quantities whose components are two univariate random quantities having all logically possible samples of a given size as their logically possible values. All univariate random quantities which we have defined are complementary to the univariate Horvitz-Thompson estimator. It is linear and homogeneous like a coherent prevision of a univariate random quantity whose logically possible values are all logically possible samples of a given size belonging to a given set. A univariate random quantity representing deviations as well as a univariate random quantity representing variations are defined on the basis of a coherent prevision of a given univariate random quantity. These random quantities are the same quantity from a randomness point of view. We have identified a quadratic and linear metric with regard to two univariate random quantities representing deviations. We have used the α-criterion of concordance introduced by Gini in order to identify it. References Pierpaolo Angelini. A quadratic and linear metric characterizing the sampling de- sign with fixed sample size considered from a geometric viewpoint. European Scientific Journal, 16(15):1–19, 2020. D. Basu. On sampling with and without replacement. Sankhya: The Indian Jour- nal of Statistics, 20(3-4):287–294, 1958. 283 Pierpaolo Angelini D. Basu. An essay on the logical foundations of survey sampling, part one. In V. P. Godambe and D. A. Sprott, editors, Foundations of Statistical Inference. Holt, Rinehart & Winston, Toronto, 1971. L. Bondesson. Recursion formulas for inclusion probabilities of all orders for conditional Poisson, Sampford, Pareto, and more general sampling designs. In M. Carlson, H. Nyquist, and M. Villani, editors, Official statistics, methodology and applications in honour of Daniel Thorburn. Brommatryck & Brolins AB, Stoccolma, 2010. G. Coletti, R. Scozzafava, and B. Vantaggi. Possibilistic and probabilistic logic under coherence: default reasoning and system P. Mathematica Slovaca, 65(4): 863–890, 2015. W. S. Connor. An exact formula for the probability that two specified sampling units will occur in a sample drawn with unequal probabilities and without re- placement. Journal of the American Statistical Association, 61:384–390, 1966. P. L. Conti and D. Marella. Inference for quantiles of a finite population: asymp- totic versus resampling results. Scandinavian Journal of Statistics, 42:545–561, 2015. B. de Finetti. Probability: beware of falsifications! In H. E. Kyburg jr. and H. E. Smokler, editors, Studies in subjective probability. R. E. Krieger Publishing Company, Huntington, New York, 1980. B. de Finetti. The role of “dutch books” and of “proper scoring rules”. The British Journal of Psychology of Sciences, 32:55–56, 1981. B. de Finetti. Probability: the different views and terminologies in a critical anal- ysis. In L. J. Cohen, J. Łoś, H. Pfeiffer, and K.-P. Podewski, editors, Logic, Methodology and Philosophy of Science VI, pages 391–394. North-Holland Publishing Company, Amsterdam, 1982a. B. de Finetti. The proper approach to probability. In G. Koch and F. Spizzichino, editors, Exchangeability in Probability and Statistics. North-Holland Publish- ing Company, Amsterdam, 1982b. B. de Finetti. Probabilism: A critical essay on the theory of probability and on the value of science. Erkenntnis, 31(2-3):169–223, 1989. B. de Finetti. La probabilità e la statistica nei rapporti con l’induzione, secondo i diversi punti di vista. In B. de Finetti, editor, Induzione e statistica, pages 5–115. Springer, Heidelberg, 2011. 284 The α-criterion of concordance applied to probability J.-C. Deville and Y. Tillé. Unequal probability sampling without replacement through a splitting method. Biometrika, 85:89–101, 1998. A. Gilio and G. Sanfilippo. Conditional random quantities and compounds of conditionals. Studia logica, 102(4):709–729, 2014. V. P. Godambe. A unified theory of sampling from finite populations. Journal of the Royal Statistical Society, B17(2):269–278, 1955. V. P. Godambe and V. M. Joshi. Admissibility and bayes estimation in sampling finite populations. i. The Annals of Mathematical Statistics, 36(6):1707–1722, 1965. I. J. Good. Subjective probability as the measure of a non-measurable set. In E. Nagel, P. Suppes, and A. Tarski, editors, Logic, Methodology and Philosophy of Science. Stanford University Press, Stanford, 1962. J. Hájek. Some contributions to the theory of probability sampling. Bulletin of the international Statistical Institute, 36(3):127–134, 1958. H. O. Hartley and J. N. K. Rao. Sampling with unequal probabilities and without replacement. The Annals of Mathematical Statistics, 33(2):350–374, 1962. D. G. Horvitz and D. J. Thompson. A generalization of sampling without replace- ment from a finite universe. Journal of the American Statistical Association, 47 (260):663–685, 1952. V. M. Joshi. A note on admissible sampling designs for a finite population. The Annals of Mathematical Statistics, 42(4):1425–1428, 1971. Y. Tillé and M. Wilhelm. Probability sampling designs: principles for choice of design and balancing. Statistical Science, 32(2):176–189, 2017. F. Yates and P. M. Grundy. Selection without replacement from within strata with probability proportional to size. Journal of the Royal Statistical Society B, 15 (2):253–261, 1953. 285