Ratio Mathematica Volume 38, 2020, pp. 261-285

Geometrical foundations of the sampling
design with fixed sample size

Pierpaolo Angelini ∗

Abstract

We study the sampling design with fixed sample size from a geomet-
ric point of view. The first-order and second-order inclusion proba-
bilities are chosen by the statistician. They are subjective probabili-
ties. It is possible to study them inside of linear spaces provided with
a quadratic and linear metric. We define particular random quanti-
ties whose logically possible values are all logically possible sam-
ples of a given size. In particular, we define random quantities which
are complementary to the Horvitz-Thompson estimator. We identify
a quadratic and linear metric with regard to two univariate random
quantities representing deviations. We use the α-criterion of concor-
dance introduced by Gini in order to identify it. We innovatively ap-
ply to probability this statistical criterion.
Keywords: tensor product; linear map; bilinear map; quadratic and
linear metric; α-product; α-norm
2010 AMS subject classifications: 62D05. 1

∗Dipartimento di Scienze Statistiche, Università La Sapienza, Roma, Italia;
pier.angelini@uniroma1.it

1Received on May 12th, 2020. Accepted on June 3rd, 2020. Published on June 30th, 2020.
doi: 10.23755/rm.v38i0.511. ISSN: 1592-7415. eISSN: 2282-8214. c©P. Angelini
This paper is published under the CC-BY licence agreement.

261


Pierpaolo Angelini

1 Introduction

Given a finite population having N elements, we are only interested in con-
sidering samples containing units of this population where no element of the pop-
ulation under consideration can be selected more than once in the same sample
(Basu [1971]). We are not interested in considering ordered samples of a given
size selected from a finite population (Basu [1958]). On the other hand, when we
consider not ordered samples where repetitions are not allowed we have no loss of
information about a given parameter of the population under consideration (Conti
and Marella [2015]). All logically possible samples of a given size belong to a
given set. We suppose that we are always able to number them. It is known that
if the number of all logically possible samples of a given set is very large then
it could be a very hard or impossible work to give to them a number (Godambe
and Joshi [1965]). A sampling design is characterized by a pair of elements (Joshi
[1971]). The first element of this pair represents the set of all logically possi-
ble samples selected from a finite population. The second element of this pair
represents all probabilities assigned to the samples of the set of all logically pos-
sible samples of a given size. We consider a distribution of probability in this way
(Hartley and Rao [1962]). Each element of the set of all logically possible samples
of a given size can be viewed as a logically possible event of a finite partition of
incompatible and exhaustive elementary events. It is then possible to assign a sub-
jective probability to each logically possible event of this partition (Good [1962]).
A probability subjectively assigned to each logically possible event of a finite par-
tition of events must only be coherent. It is inadmissible when it is not coherent.
A probability is subjectively assigned to each logically possible event of a finite
partition of events even when it is an equal probability assigned to each of them.
An equal probability assigned to each logically possible event of a finite partition
of events is always a subjective judgment. We have to note a very important point:
when we say that it is possible to assign a subjective and coherent probability to
every logically possible event of a given set of events we mean that the choice of
any value in the interval from 0 to 1 is allowed. Such an interval includes both end-
points. It would therefore be possible to assign to every logically possible event of
a given set of events a probability equal to 0. This choice is absolutely coherent.
We will however introduce a restriction that is concerned with this point. We have
to note another very important point: we methodologically distinguish what it is
logically possible from what it is subjectively probable. What it is logically possi-
ble at a given instant it is not either certainly true or certainly false. One and only
one element of the elements belonging to the set containing all logically possible
elements at a given instant will be true a posteriori. A subjective probability is
then assigned to each element of the set containing all logically possible elements
before knowing this thing.

262


The α-criterion of concordance applied to probability

2 Events as points in the space of random quantities
We consider a finite set of vectors denoted by S into the field R of real num-

bers. We enumerate them. We consequently write

s1, . . . ,sN, (1)

where it turns out to be si ∈ S, i = 1, . . . ,N. We consider a linear space over R
of all linear combinations of elements of S expressed in the form

c1s1 + . . . + cNsN, (2)

where every ci, i = 1, . . . ,N, is a real number. We observe that (2) is completely
determined by the real numbers c1, . . . ,cN . Each number ci is associated with the
element si of the set S. It is known that an association is exactly a function. For
each si ∈S and c ∈ R we then consider

csi (3)

to be the function that associates c with si and 0 with sj, where we have j 6= i.
Given a ∈ R, we obtain

a(csi) = (ac)si. (4)

Given c′ ∈ R, we obtain
(c + c′)si = csi + c

′si. (5)

Thus, it is possible to consider a linear space over R. It is the set of all functions of
S into R. These functions can be written in the form given by (2). The functions

1s1, . . . , 1sN (6)

are linearly independent so they represent a basis of the linear space under con-
sideration. We have then to suppose that c1, . . . ,cN are elements of R such that it
is possible to obtain the zero function given by

c1s1 + . . . + cNsN = 0. (7)

This means that we have ci = 0 for every ci, i = 1, . . . ,N. This thing con-
sequently proves the linear independence under consideration. Moreover, it is
always possible to write si instead of 1si. A sample belonging to the set of all
logically possible samples of a given size is then expressed by the vector

δ(s′) =



δ(1; s′)
δ(2; s′)

...
δ(N; s′)


 (8)

263


Pierpaolo Angelini

having N components, where s′ is a sample of the set of all logically possible sam-
ples denoted by S′ (Godambe [1955]). We will always consider vectors viewed
as ordered lists of real numbers within this context. A sample can be expressed
by the real numbers of a linear combination of N-dimensional vectors by means
of which another N-dimensional vector is obtained. If a sample is identified with
an N-dimensional vector then its components express the real numbers of a lin-
ear combination of the elements of a basis of the linear space under considera-
tion. This linear space is denoted by RN . Its basis is denoted by S = {ej},
j = 1, . . . ,N. We always consider orthonormal bases within this context. We
therefore write

δ(1; s′)e1 + δ(2; s
′)e2 + . . . + δ(N; s

′)eN = y, (9)

where we have y ∈ RN . We consider as many linear combinations of the elements
of S = {ej}, j = 1, . . . ,N, as logically possible samples there are into the set of
all logically possible samples of a given size denoted by S′. We note that the real
numbers of every linear combination of the elements of S = {ej}, j = 1, . . . ,N,
represent one of the logically possible samples of S′. We have evidently

δ(i; s′) =

{
1 if i ∈ s′

0 if i /∈ s′
(10)

for every i = 1, . . . ,N, where the elements of the population under consideration
are overall N. We consider all logically possible samples of S′ having the same
size denoted by n. Since the population has got N elements we observe that the
number of n-combinations is equal to the binomial coefficient denoted by

(
N
n

)
.

We observe that S′ whose elements are elementary events is a subset of RN . We
say that S′ is embedded in RN .

3 Finite partitions of logically possible elementary
events

Given N, all logically possible samples whose size is equal to n belong to the
set denoted by S′. We have

n =
N∑
i=1

δ(i; s′) (11)

for every s′ ∈S′. Every sample of the set of all logically possible samples corre-
sponds to a vertex denoted by δ(s′) of an N-dimensional unit hypercube denoted
by [0, 1]N . All logically possible samples of S′ can be viewed as possible events

264


The α-criterion of concordance applied to probability

of a finite partition of incompatible and exhaustive elementary events (de Finetti
[1982b]). We are consequently able to define a univariate random quantity whose
logically possible values are represented by all logically possible samples of S′.
The logically possible values of it are not real numbers but they are N-dimensional
vectors of an N-dimensional linear space over R. Every logically possible sample
belonging to S′ has a subjective probability of being selected (de Finetti [1989].
It represents the degree of belief in the selection of a logically possible sample
assigned by a given individual (the statistician) at a certain instant with a given set
of information. An evaluation of probability known over a set of possible and el-
ementary events coinciding with all logically possible samples of S′ is admissible
when it is coherent. This means that it must be∑

s′∈S′
p(s′) = 1. (12)

It is essential to note a very important point: we have to introduce an unusual
restriction with regard to the coherence because we exclude of choosing a subjec-
tive probability equal to 0 with respect to any possible and elementary event. This
implies that any logically possible sample of S′ has always a probability greater
than zero of being selected. We have consequently

0 < p(s′) ≤ 1 (13)

for every s′ ∈ S′ (Coletti et al. [2015]). Thus, conditions of coherence coincide
with positivity of each probability of a random event and finite additivity of prob-
abilities of incompatible and exhaustive events (Gilio and Sanfilippo [2014]). We
will also consider bivariate random quantities whose components are two univari-
ate random quantities (de Finetti [2011]). If the logically possible values of these
univariate random quantities are the same vectors of the same N-dimensional lin-
ear space over R then these random quantities have the same marginal distribu-
tions of probability. They represent the same finite partition of incompatible and
exhaustive elementary events. Putting them into a two-way table we observe that
it is always a table having the same number of rows and columns.

4 First-order inclusion probabilities viewed as a co-
herent prevision of a univariate random quantity

We consider a univariate random quantity denoted by S whose logically possi-
ble values are vectors of RN . Given N and n, the number of the logically possible
values of S coincides with the binomial coefficient expressed by(

N

n

)
= k. (14)

265


Pierpaolo Angelini

The set of the logically possible values of S is then given by I(S) = {s′1, . . . ,s′k},
with s′i ∈S′, i = 1, . . . ,k. A nonzero probability is assigned to each sample of the
set of all logically possible samples. Let p(s′1), . . . ,p(s

′
k) be these probabilities. It

must therefore be
k∑

i=1

p(s′i) = 1, (15)

with
0 < p(s′i) ≤ 1 (16)

for every i = 1, . . . ,k. It is possible to obtain an N-dimensional vector after
assigning a nonzero probability to each sample of S′. We denote it with π. It
represents the first-order inclusion probabilities of all units of the population under
consideration. Thus, we write

π =



π1
π2
...
πN


 = p(s′1)



δ(1; s′1)
δ(2; s′1)

...
δ(N; s′1)


 + . . . + p(s′k)



δ(1; s′k)
δ(2; s′k)

...
δ(N; s′k)


 , (17)

where we have πi > 0 for every i = 1, . . . ,N. We have evidently written a
convex combination of the vertices of the N-dimensional unit hypercube [0, 1]N

corresponding to the samples of S′. Each vertex is a sample having a nonzero
weight representing a subjective probability. It is essential to note that π is a
coherent prevision of S denoted by P(S). We therefore write

π =



π1
π2
...
πN


 = P(S) =

k∑
i=1

δ(s′i)p(s
′
i). (18)

We observe that the logically possible values of S are represented by vectors hav-
ing N components so its coherent prevision must also be represented by a vector
having N components. The logically possible values of S belong to the set de-
noted by I(S). Each element of I(S) contains first-order inclusion “a posteriori”
probabilities. This implies that π must contain first-order inclusion “a priori”
probabilities based on the degree of belief in the selection of all logically possible
samples attributed by the statistician at a certain instant with a given set of infor-
mation. An “a posteriori” probability of a unit of the population of being included
in a given sample is always predetermined. If a unit of the population is contained
“a posteriori” in the sample that has been selected then its probability is equal to
1. If a unit of the population does not belong “a posteriori” to the sample that

266


The α-criterion of concordance applied to probability

has been selected then its probability is equal to 0. A convex combination coin-
ciding with P(S) has conveniently been taken under consideration because the
logically possible values of S are incompatible and exhaustive elementary events
of a finite partition of random events. In general, if we consider an event divided
into two or more than two incompatible events then we obtain that its coherent
probability is the sum of two or more than two coherent probabilities. This sum is
a linear combination of probabilities (de Finetti [1980]). We evidently consider a
convex combination coinciding with P(S) within this context, where its weights
or coefficients are “a priori” subjective probabilities connected with the samples
of S′ (de Finetti [1981]). This convex combination is characterized by k column
vectors viewed as k matrices. Each row of every N × 1 matrix is a first-order
inclusion “a posteriori” probability. We therefore consider a linear combination
of probabilities (de Finetti [1982a]).

5 First-order inclusion probabilities obtained by means
of linear maps

We consider all logically possible samples belonging to the set S′. Given N
and n, let k be the number of all elements of S′. We are consequently able to
determine an N × k matrix in R. We denote it by B. It is therefore possible to
define a linear map expressed by

LB : Rk → RN. (19)

It depends on B. Moreover, it also depends on the choice of bases for Rk and RN .
We choose standard bases for Rk and RN . We consider all probabilities assigned
to the logically possible samples of S′ whose size is equal to n. They can be
viewed as a column vector. We denote it by Q. We have then

Q =



p(s′1)
p(s′2)

...
p(s′k)


 . (20)

It therefore turns out to be

LB(Q) = BQ = π =



π1
π2
...
πN


 . (21)

267


Pierpaolo Angelini

We note that if k = N then we are able to define a linear map expressed by

LB : RN → RN. (22)

We observe that B is a square matrix. This linear map is an endomorphism. It is
also an isomorphism. It is then an automorphism, so we write

B−1π =



p(s′1)
p(s′2)

...
p(s′k)


 . (23)

Given B, each row of Q can subjectively vary because an evaluation of probability
known over a set of logically possible events must only be coherent. This means
that the sum of all probabilities of the samples of S′ must be equal to 1. We
consequently observe that there are infinite ways of choosing all probabilities of
the samples of S′. They are conveniently caught by LB. It is hence possible to
obtain π as a multiplication of matrices according to a linear map depending on B
and the standard bases of the linear spaces under consideration. Also, we always
obtain

N∑
i=1

πi = n. (24)

6 First-order and second-order inclusion probabili-
ties obtained by means of tensor products

We consider a bivariate random quantity denoted by S12 whose components
are two univariate random quantities denoted by 1S and 2S. We therefore write
S12 = {1S, 2S}. Given N and n, the logically possible values of each univari-
ate random quantity coincide with k samples belonging to the set S′. They are
all logically possible samples of S′ whose size is equal to n. Each sample of S′
is a vector of RN . We have to note a very important point: we suppose that the
logically possible values of 1S and 2S are the same N-dimensional vectors of
the same N-dimensional linear space over R. These univariate random quanti-
ties have then the same marginal distributions of probability. Putting them into
a two-way table we observe that it is always a square table. We observe that all
probabilities of the joint distribution of probability outside of the main diagonal
of this table are always equal to 0. The nonzero probabilities of the joint distribu-
tion of probability coincide with p(s′1), . . . ,p(s

′
k). They are on the main diagonal

of the table under consideration. A coherent prevision of S12 denoted by P(S12)

268


The α-criterion of concordance applied to probability

is obtained by means of the sum of k square matrices. The number of rows and
columns of every square matrix of this sum is equal to N. Each square matrix of
this sum derives from a tensor product belonging to the same linear space denoted
by RN ⊗RN . It is an N2-dimensional linear space over R. We always consider as
many tensor products as joint probabilities are associated with the samples of S′.
We have then

p(s′i)





δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)


 ,


δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)




 7→ p(s′i)





δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)


⊗



δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)




 (25)

for every i = 1, . . . ,k. We note that it turns out to be

p(s′i)





δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)


⊗



δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)




 = p(s′i)



δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)


[δ(1; s′i) δ(2; s′i) . . . δ(N; s′i)] .

(26)
If we consider a coherent prevision of S12 then we deal with a bilinear map
expressed by RN × RN → MN, N (R), where the linear space over R of the
N × N matrices in R is denoted by MN, N (R). This linear space is isomor-
phic to RN2 . The matrix product resulting from this bilinear map is factorized
by means of the tensor product of vectors of RN . It is also factorized by means
of a unique linear map whose domain coincides with RN ⊗ RN . This is because
we are able to know a basis of RN ⊗ RN as well as the value of the linear map
under consideration on basis elements. We suppose that a basis of RN ⊗ RN
results from the standard basis of RN , where RN is evidently considered two
times. It is therefore possible to say that there exists a unique linear map given
by RN ⊗ RN → MN, N (R). It coincides with the product of a joint probability
viewed as a scalar and a square matrix. We consider k products of a joint proba-
bility and a square matrix. We obtain k square matrices in this way. We consider
the sum of these k square matrices in order to obtain a coherent prevision of S12.
We observe that RN × RN → MN, N (R) and RN ⊗ RN → MN, N (R) have
the same codomain. A factorization of RN × RN → MN, N (R) is then realized
by means of a bilinear map given by RN × RN → RN ⊗ RN and a linear map
given by RN ⊗RN →MN, N (R). These two maps are connected, so we obtain a
composition of functions identified with RN × RN →MN, N (R). The following

269


Pierpaolo Angelini

commutative diagram

RN ×RN RN ⊗RN

MN, N (R)

permits of visualizing what we have said. A coherent prevision of S12 is then
bilinear and homogeneous. It is given by

P(S12) = Π =



π1 π12 . . . π1N
π21 π2 . . . π2N
. . . . . . . . . . . .
πN1 πN2 . . . πN


 =



π1 π12 . . . π1N
π12 π2 . . . π2N
. . . . . . . . . . . .
π1N π2N . . . πN


 . (27)

It coincides with the symmetric matrix of the first-order and second-order in-
clusion probabilities. The trace of this matrix is evidently equal to n (Angelini
[2020]).

7 The covariance of two univariate random quanti-
ties obtained by considering two bilinear maps

Given S12 = {1S, 2S}, the covariance of 1S and 2S is expressed by

C(1S, 2S) = P(S12) −P(1S)P(2S), (28)

where P(S12) represents the prevision or mathematical expectation or expected
value of S12, while P(1S) and P(2S) represent the prevision or mathematical
expectation or expected value of 1S and 2S. We note that a coherent prevision of
S12 derives from a bilinear map because we have

P(S12) =



π1 π12 . . . π1N
π21 π2 . . . π2N
. . . . . . . . . . . .
πN1 πN2 . . . πN


 . (29)

Moreover, since we have

P(1S) =



π1
π2
...
πN


 (30)

270


The α-criterion of concordance applied to probability

as well as

P(2S) =



π1
π2
...
πN


 , (31)

we note that the product of these two linear maps is evidently bilinear. Such a
product is expressed in the form


π1
π2
...
πN


[π1 π2 . . . πN] =



π1π1 π1π2 . . . π1πN
π2π1 π2π2 . . . π2πN
. . . . . . . . . . . .
πNπ1 πNπ2 . . . πNπN


 . (32)

It is then evident that the covariance of 1S and 2S derives from two bilinear maps
because we can write

C(1S, 2S) =



π1 π12 . . . π1N
π21 π2 . . . π2N
. . . . . . . . . . . .
πN1 πN2 . . . πN


−



π1π1 π1π2 . . . π1πN
π2π1 π2π2 . . . π2πN
. . . . . . . . . . . .
πNπ1 πNπ2 . . . πNπN


 . (33)

By writing

C(1S, 2S) =




(π1 −π1π1) (π12 −π1π2) . . . (π1N −π1πN )
(π21 −π2π1) (π2 −π2π2) . . . (π2N −π2πN )

. . . . . . . . . . . .
(πN1 −πNπ1) (πN2 −πNπ2) . . . (πN −πNπN )


 (34)

we note that it is possible to consider as many random components as inclusion
probabilities are studied. A unit of the population under consideration can be in-
cluded, or not, in a given sample (Bondesson [2010]). This thing is uncertain until
a given sample is selected (Hájek [1958]). Two different units of the population
under consideration can be included, or not, in the same sample (Deville and Tillé
[1998]). This thing is uncertain until a given sample is selected. A component
associated with one or two different units of the population under consideration is
evidently random for this reason (Connor [1966]). This means that each random
component is characterized by a subjective probability. It is an “a priori” prob-
ability. It is also characterized by an “a posteriori” probability coinciding with
one of the two logically possible values of a random event, 0 and 1. One and
only one of these two logically possible values of a random event will be true “a
posteriori”. On the other hand, it is known that the notion of probability basically

271


Pierpaolo Angelini

deals with an aspect that is included between two extreme aspects. The first ex-
treme aspect deals with situations of non-knowledge or ignorance or uncertainty
determining the set of all logically possible samples of a given size viewed as el-
ementary events. They are evidently all logically possible alternatives that can be
considered. The second extreme aspect deals with definitive certainty expressed
in the form of what it is certainly true or certainly false. Thus, every logically
possible sample of a given size definitively becomes true or false. Probability is
subjectively distributed by the statistician as a mass over the domain of all logi-
cally possible samples of a given size before knowing which is the true sample to
be selected “a posteriori”. Having said that, the variance of every random compo-
nent as well as the covariance of two random components are dealt with by means
of the first-order and second-order inclusion probabilities. The variance of each
random component is represented by every element on the main diagonal of the
symmetric matrix given by (34). The covariance of two random components is
represented by every element outside of the main diagonal of the square matrix
given by (34).

8 A univariate random quantity representing devi-
ations

We define a univariate random quantity representing deviations. We denote it
by D. We firstly consider S whose values are all logically possible samples of
a given size viewed as elementary events belonging to the set S′. Given N and
n, the number of the logically possible values of S is equal to

(
N
n

)
= k. The set

of the logically possible values of S is then given by I(S) = {s′1, . . . ,s′k}, with
s′i ∈ S′, i = 1, . . . ,k. A nonzero probability denoted by p(s′i), i = 1, . . . ,k,
is assigned to each sample of S′. We therefore obtain an N-dimensional vector
denoted by π. It represents the first-order inclusion probabilities of all units of
the population under consideration. They are all greater than zero. This vector is
always independent of the origin of the coordinate system that we could consider.
We note that the number of the logically possible values of D is equal to k. It is
the same of the one of S. The set of the logically possible values of D is given by
I(D) = {d′1, . . . ,d′k}, with

d′i =





δ(1; s′i)
δ(2; s′i)

...
δ(N; s′i)


−



π1
π2
...
πN




 , (35)

272


The α-criterion of concordance applied to probability

where we have i = 1, . . . ,k. It follows that we have

p(s′1)d
′
1 + . . . + p(s

′
k)d
′
k =




0
0
...
0


 . (36)

This means that P(S) is an N-dimensional vector such that all deviations from
it that are multiplied by the corresponding probabilities represent N-dimensional
vectors whose sum coincides with the zero vector of RN . We are now able to
calculate the variance of S by using D. We refer to the α-criterion of concordance
introduced by Gini. It is a statistical criterion that we innovatively apply to prob-
ability viewed as a mass. An absolute maximum of concordance is then realized
when each d′i, i = 1, . . . ,k, is multiplied by itself. If each d

′
i, i = 1, . . . ,k, is

multiplied by itself then we obtain k square matrices. Every multiplication that
we consider is a tensor product of two vectors of RN . These two vectors represent
two deviations which are the same. The components of these two vectors are then
the same. Hence, the variance of S coincides with the sum of k traces of k square
matrices. Each trace of the square matrix under consideration is an inner product
viewed as an α-product. An α-product is a bilinear form. We consider each p(s′i),
i = 1, . . . ,k, as a scalar. Each p(s′i), i = 1, . . . ,k, is firstly a subjective proba-
bility. Thus, it always characterizes a random quantity. It is nevertheless viewed
as a scalar within this context. We can therefore multiply all components of d′i by
p(s′i), i = 1, . . . ,k. We note that the components of each d

′
i, i = 1, . . . ,k, are

always independent of the origin of the coordinate system that we could consider.
We therefore write

σ2S = tr
(
d′1

T
(p(s′1)d

′
1)
)

+ . . . + tr
(
d′k

T
(p(s′k)d

′
k)
)
. (37)

We have evidently introduced a quadratic and linear metric in this way. We there-
fore note that σ2S is the sum of the squares of k α-norms. It is possible to verify
that every trace of a square matrix is an α-product which is an α-commutative
product, an α-associative product, an α-distributive product and an α-orthogonal
product. We have to note a very important point: S and D are two different quan-
tities from a geometric point of view because they are represented by different
sets of N-dimensional vectors. They are nevertheless the same quantity from a
randomness point of view. They are characterized by the same probabilities. We
therefore observe the same events because we consider only a change of origin.

273


Pierpaolo Angelini

9 Intrinsic properties of a univariate random quan-
tity representing deviations

Translations and rotations of vectors identifying a given univariate random
quantity representing deviations are intrinsic properties of it. They do not depend
on the choice of a basis of a given linear space. We say that all vectors of S′ are
subjected to the same translation when we consider k sums of two vectors. We
consider k sums of two vectors because the number of the elements of S′ is equal
to k. The first vector of each sum of them is given by s′i, i = 1, . . . ,k. The second
vector of each sum of them is given by an arbitrary N-dimensional vector which
is always the same. We say that all vectors of S′ are then subjected to the same
change of origin. It follows that σ2S is invariant with respect to a translation of all
vectors of S′. We say that a quadratic and linear metric is invariant with respect to
a translation of all vectors of S′. Concerning a rotation, let A = (ai′j ) be an N×N
orthogonal matrix. Each element of this matrix is denoted by two indices. We use
contravariant and covariant indices without loss of generality. The contravariant
indices represent the rows of the matrix. We have i′ = 1, . . . ,N. The covariant
indices represent the columns of the matrix. We have j = 1, . . . ,N. We observe
that rotations of all vectors contained in I(D) = {d′1, . . . ,d′k} are characterized
by A. We write

RA(d′i) : d
′
i ⇒ Ad

′
i = (d

′
i)
∗, (38)

where we have i = 1, . . . ,k. We evidently denote by (d′i)
∗ the result of the rotation

of the vector d′i obtained by means of the orthogonal matrix denoted by A. The
vector (d′i)

∗ is an N-dimensional vector. Its components are originated by N
linear and homogeneous relationships. We have to note a very important point:
P(S) is an N-dimensional vector such that all rotated deviations from it that are
multiplied by the corresponding probabilities represent N-dimensional vectors
whose sum coincides with the zero vector of RN . We have then

p(s′1)(d
′
1)
∗ + . . . + p(s′k)(d

′
k)
∗ =




0
0
...
0


 . (39)

If we consider rotated deviations then we write

σ2S∗ = tr
(
(d′1)

∗T (p(s′1)(d
′
1)
∗)
)

+ . . . + tr
(
(d′k)

∗T (p(s′k)(d
′
k)
∗)
)
, (40)

where S∗ represents a univariate random quantity connected with rotated devia-
tions. Since it turns out to be

σ2S = σ
2
S∗, (41)

274


The α-criterion of concordance applied to probability

we say that the variance of S is invariant with respect to all rotated deviations
obtained by means of the same orthogonal matrix denoted by A. We have there-
fore introduced a quadratic and linear metric which is invariant with respect to
translations and rotations of vectors identifying a univariate random quantity rep-
resenting deviations.

10 A univariate random quantity representing vari-
ations and its intrinsic properties

We define a univariate random quantity representing variations. We denote
it by V . Given D, the set of the logically possible values of V is expressed by
I(V ) = {v′1, . . . ,v′k}, with

v′i = d
′
i

1√
σ2S
, (42)

where we have i = 1, . . . ,k. We therefore note that S, D and V are different
quantities from a geometric point of view. They are conversely the same quantity
from a randomness point of view. It is possible to verify that it turns out to be

σ2V = 1. (43)

This index is always equal to 1 independently of the components of d′i, i =
1, . . . ,k. It is evident that these components identify σ2S, so we say that σ

2
V = 1

is also independent of σ2S. We observe that rotations of all vectors belonging to
I(V ) = {v′1, . . . ,v′k} are always characterized by an N × N orthogonal matrix.
We write

(v′i)
∗ = (d′i)

∗ 1√
σ2S
, (44)

where we have i = 1, . . . ,k. If we consider translations and rotations of vectors
identifying a univariate random quantity representing variations then we observe
intrinsic properties that we have already considered. We note that V can be sub-
jected to an affine transformation. If V is subjected to an affine transformation
then we write

V ⇒ aV + b, (45)

where we have a 6= 0. We therefore observe that each vector of I(aV + b) is
equal to the corresponding vector of I(V ). This means that the components of
each vector of I(aV + b) are the same of the ones of the corresponding vector of
I(V ). Hence, we say that univariate random quantities representing variations are
invariant with respect to an affine transformation. Given S12 = {1S, 2S}, we note

275


Pierpaolo Angelini

that we have 1V = 2V = V if and only if it turns out to be 1S = 2S = S. It is pos-
sible to verify that the covariance of 1V and 2V is an α-product. It is always equal
to 1. On the other hand, it coincides with the Bravais-Pearson correlation coeffi-
cient in the case of a perfect direct linear relationship between two quantities. It is
possible to verify that the Bravais-Pearson correlation coefficient is invariant with
respect to rotations of vectors belonging to I(V ). It is therefore invariant with
respect to an affine transformation of V . We have to note a very important point:
intrinsic properties that we have considered can be related to the random quantities
themselves or to specific metric indices based on these quantities. Specific metric
indices are evidently based on random quantities representing deviations or varia-
tions because we calculate them after taking such random quantities into account.
We have to note another very important point: we are not interested in translating
or rotating a geometric object in real terms but we are interested in studying its
intrinsic properties because these properties are a fundamental consequence of its
geometric representation.

11 Metric aspects of an estimate of the population
mean

We want to wonder what happens from a metric point of view when we study
one or more than one attribute with respect to each element of the population
under consideration. We suppose of observing three different and independent
characteristics of each element of the population under consideration. We ad-
mit this thing without loss of generality. We therefore consider three different and
independent variables denoted by X, Y and Z. We note that X is the variable con-
cerning the first attribute of each element of the population under consideration.
The variable concerning the second attribute of each element of the population
under consideration is denoted by Y . The variable concerning the third attribute
of each element of the population under consideration is denoted by Z. If we
study only one attribute of each element of the population under consideration
then we estimate the population mean by using the univariate Horvitz-Thompson
estimator. It is defined by

t
(x)
HT =

1

N

N∑
i=1

1

πi
δ(i; s′)xi, (46)

where we have s′ ∈ S′. It is linear and homogeneous (Horvitz and Thompson
[1952]). We note that s′ is one of the logically possible samples of S′. Also, the
weight of the generic unit i of the population under consideration never depends
on s′. It is obtained beginning from (17). We have conversely considered all

276


The α-criterion of concordance applied to probability

logically possible samples of S′ when we have defined S, D and V . We did not
consider only one of them. These random quantities are complementary to the
univariate Horvitz-Thompson estimator for this reason. Also, we have always
taken P(S) = π into account when we have defined S, D and V . On the other
hand, a coherent prevision of S is itself linear and homogeneous. The expected
value of the univariate Horvitz-Thompson estimator is given by

E[t
(x)
HT ] = µx. (47)

It is equal to the population mean denoted by µx for any vector (x1 x2 . . . xN )T ∈
RN . We have

µx =
1

N

N∑
i=1

xi. (48)

The variance of the univariate Horvitz-Thompson estimator is given by

V(t
(x)
HT ) =

1

N2

N∑
i=1

N∑
j=1

xi
πi

xj
πj

∆ij, (49)

where we have ∆ij = πij −πiπj, with i,j = 1, . . . ,N. We note that ∆ij, i,j =
1, . . . ,N, is obtained by means of (34). Since we consider all logically possible
samples whose size is equal to n we can also write

V(t
(x)
HT ) = −

1

2N2

N∑
i=1

N∑
j=1

(
xi
πi
−
xj
πj

)2
∆ij, (50)

where we have again ∆ij = πij − πiπj, with i,j = 1, . . . ,N (Yates and Grundy
[1953]). This variance is estimated by the univariate Yates-Grundy estimator
given by

V̂Y G(t
(x)
HT ) =

1

2N2

∑
i∈s′

∑
j∈s′

(
xi
πi
−
xj
πj

)2
πiπj −πij

πij
, (51)

where we have πij > 0 because we assume that the sampling design is measurable
and πij ≤ πiπj, with i,j = 1, . . . ,N. The same thing goes when we consider
Y and Z. We have to note a very important point: the variance of S denoted
by σ2S coincides with the variance of the univariate Horvitz-Thompson estimator
given by (50) when the absolute values of each deviation of xi from xj, with
i 6= j = 1, . . . ,N, are multiples of N. In addition to this thing, the variance
of S coincides with the variance of the univariate Horvitz-Thompson estimator
given by (50) when the entropy H of the sampling design with fixed sample size
is maximum (Tillé and Wilhelm [2017]), where we have

H = −
∑
s′∈S′

p(s′) log p(s′). (52)

277


Pierpaolo Angelini

We note that H is maximum when we have

p(s′1) = p(s
′
2) = . . . = p(s

′
k), (53)

with
∑k

i=1 p(s
′
i) = 1. It does not turn out to be p(s

′) = 0 within this context.
However, if we observe p(s′) = 0 with regard to (52) then it turns out to be
[0 log 0] = 0 by convention. We therefore say that the weights of the univari-
ate Horvitz-Thompson estimator are based on a coherent prevision of S. We have
obtained a linear and quadratic metric by considering two univariate random quan-
tities representing deviations. We have obtained the variance of S by using this
metric. The same thing goes when we consider Y and Z. We have to note an-
other very important point: by studying three different and independent attributes
of each element of the population under consideration we do not jointly consider
three variables but we jointly consider two variables at a time. This is because it
is not appropriate to use a trilinear form when we deal with metric relationships.
If we jointly study two attributes of each element of the population under con-
sideration then we estimate the bivariate population mean by using the bivariate
Horvitz-Thompson estimator. We write

t
(xy)
HT =

1

N2

N∑
i=1

N∑
j=1

1

πi
δ(i; s′)xi

1

πj
δ(j; s′)yj (54)

when we jointly consider X and Y , where all first-order inclusion probabilities
are greater than zero. They are obtained by means of (17). We write

t
(xz)
HT =

1

N2

N∑
i=1

N∑
j=1

1

πi
δ(i; s′)xi

1

πj
δ(j; s′)zj (55)

when we jointly consider X and Z, where all first-order inclusion probabilities
are greater than zero. They are obtained by means of (17). We write

t
(yz)
HT =

1

N2

N∑
i=1

N∑
j=1

1

πi
δ(i; s′)yi

1

πj
δ(j; s′)zj (56)

when we jointly consider Y and Z, where all first-order inclusion probabilities
are greater than zero. They are obtained by means of (17). The bivariate Horvitz-
Thompson estimator is obtained by multiplying two linear and homogeneous ex-
pressions. This means that what we have said concerning the weights of the uni-
variate Horvitz-Thompson estimator does not change. The expected value of the
bivariate Horvitz-Thompson estimator concerning X and Y is given by

E[t
(xy)
HT ] =

1

N2

N∑
i=1

N∑
j=1

1

πi
E[δ(i; s′)]xi

1

πj
E[δ(j; s′)]yj. (57)

278


The α-criterion of concordance applied to probability

We observe that it turns out to be E[δ(i; s′)] = πi as well as E[δ(j; s′)] = πj for
every s′ ∈ S′, i,j = 1, . . . ,N. It is therefore evident that (57) is equal to the
population mean denoted by µ(xy) for any vector (x1 x2 . . . xN )T ∈ RN and
(y1 y2 . . . yN )

T ∈ RN , where we have

µ(xy) =
1

N2

N∑
i=1

N∑
j=1

xi yj. (58)

The same thing goes when we consider the expected value of the bivariate Horvitz-
Thompson estimator concerning X and Z as well as the expected value of the
bivariate Horvitz-Thompson estimator concerning Y and Z. We consider an aux-
iliary variable denoted by X′ related to X when the values of X given by xi,
i = 1, . . . ,N, are unknown. We consider an auxiliary variable denoted by Y ′

related to Y when the values of Y given by yi, i = 1, . . . ,N, are unknown. We
consider an auxiliary variable denoted by Z′ related to Z when the values of Z
given by zi, i = 1, . . . ,N, are unknown. The known values of X′ are given by x′i,
i = 1, . . . ,N. We write

µx′ =
1

N

N∑
i=1

x′i. (59)

If X and X′ are approximately proportional then it turns out to be

xi
x′i
≈ constant, (60)

where we have i = 1, . . . ,N. The first-order inclusion probabilities chosen by the
statistician are then given by

πi =
nx′i
Nµx′

, (61)

where we have i = 1, . . . ,N. We note that such probabilities are used into (23)
in order to obtain p(s′i), i = 1, . . . ,k, when we have k = N. We observe that
p(s′i), i = 1, . . . ,k, are used in order to obtain a coherent prevision of S. If we
have k 6= N then we consider a system of N linear equations with k unknowns,
where π1, . . . ,πN are constant terms. We evidently refer to (21). We therefore
observe that π1, . . . ,πN represent a coherent prevision of S obtained beginning
from p(s′i), i = 1, . . . ,k. We observe that α-products and α-norms use p(s

′
i),

i = 1, . . . ,k, as scalars. Also the second-order inclusion probabilities character-
ize our metric structure. They are obtained by means of tensor products having
p(s′i), i = 1, . . . ,k, as scalars. They are chosen by the statistician because he sub-
jectively chooses p(s′i), i = 1, . . . ,k. He is consequently able to observe πij > 0,
i,j = 1, . . . ,N. We have established them in (27). The same thing goes when we
consider Y ′ and Z′.

279


Pierpaolo Angelini

12 A metric homoscedasticity of different variables
identifying different and independent attributes
of the units of the population

We have jointly to consider two variables at a time for a metric reason. When
we jointly consider X and Y we have firstly to disaggregate t(xy)HT . Given

t
(x)
HT =

1

N

N∑
i=1

1

πi
δ(i; s′)xi (62)

and

t
(y)
HT =

1

N

N∑
j=1

1

πj
δ(j; s′)yj, (63)

the covariance of these two univariate Horvitz-Thompson estimators is therefore
expressed by

C(t
(x)
HT , t

(y)
HT ) =

1

N2

N∑
i=1

N∑
j=1

xi
πi

yj
πj

∆ij, (64)

where we have ∆ij = πij −πiπj, with i,j = 1, . . . ,N. We note that ∆ij, i,j =
1, . . . ,N, is obtained by means of (34). The same thing goes when we jointly
consider X and Z as well as Y and Z. We note that

C(t
(x)
HT , t

(x)
HT ) = V(t

(x)
HT ) =

1

N2

N∑
i=1

N∑
j=1

xi
πi

xj
πj

∆ij, (65)

where we have ∆ij = πij − πiπj, i,j = 1, . . . ,N. We observe that ∆ij, i,j =
1, . . . ,N, is obtained by means of (34). We note that

C(t
(y)
HT , t

(y)
HT ) = V(t

(y)
HT ) =

1

N2

N∑
i=1

N∑
j=1

yi
πi

yj
πj

∆ij, (66)

where we have ∆ij = πij − πiπj, with i,j = 1, . . . ,N. We observe that ∆ij,
i,j = 1, . . . ,N, is obtained by means of (34). It is also possible to write

C(t
(z)
HT , t

(z)
HT ) = V(t

(z)
HT ) =

1

N2

N∑
i=1

N∑
j=1

zi
πi

zj
πj

∆ij, (67)

where we have ∆ij = πij − πiπj, with i,j = 1, . . . ,N. We observe that ∆ij,
i,j = 1, . . . ,N, is obtained by means of (34). We are interested in knowing

280


The α-criterion of concordance applied to probability

what happens from a metric point of view when we study three different and
independent attributes with respect to each element of the population under con-
sideration. We have defined S, D and V . In particular, we consider a bivariate
random quantity representing deviations. It is expressed by D12 = {1D, 2D}.
Its components are two univariate random quantities, 1D and 2D, identifying two
sets of N-dimensional vectors. Each vector of a set of N-dimensional vectors is
equal to the corresponding vector of the other set of N-dimensional vectors. We
have consequently I(1D) = I(2D) = {d′1, . . . ,d′k}. Given p(s

′
i), i = 1, . . . ,k,

we observe that 1D is equal to 2D, so the covariance of 1D and 2D is equal to
the variance of S denoted by σ2S. We observe this thing regardless of any pair of
variables that we consider. We could indifferently consider X and Y or X and
Z or Y and Z. On the other hand, if we take 1V and 2V into account then we
note that their covariance is equal to 1. Since it turns out to be 1V = 2V = V we
say that the variance of V is equal to 1. We observe this thing regardless of any
pair of variables that we consider. We could indifferently consider X and Y or X
and Z or Y and Z. We therefore say that X, Y and Z are homoscedastic from
a metric point of view. We say this thing after considering all logically possible
samples having a given size belonging to S′. We say this thing after defining S
with respect to X, Y , Z. We say this thing because, given p(s′i), i = 1, . . . ,k, the
variance of S is always the same. It is obtained by virtue of the metric structure
that we have introduced.

13 What is all this for?
All the first-order inclusion probabilities derive from a coherent prevision of

S. A coherent prevision of S always depends on p(s′i), i = 1, . . . ,k, where these
probabilities are coherently chosen by the statistician. All the second-order inclu-
sion probabilities derive from a coherent prevision of S12. A coherent prevision
of S12 always depends on p(s′i), i = 1, . . . ,k. A coherent prevision of S is linear
and homogeneous. A coherent prevision of S12 is bilinear and homogeneous. The
bivariate Horvitz-Thompson estimator is obtained by multiplying two linear and
homogeneous expressions. This means that what we are going to say concerning
the weights of the univariate Horvitz-Thompson estimator continues to be valid
even when we make reference to the bivariate Horvitz-Thompson estimator. We
therefore make reference to the first-order inclusion probabilities. If there exists
a direct linear relationship between X′ and X then the statistician chooses high
inclusion probabilities denoted by πi with respect to the units of the population
under consideration having high attributes of X′ denoted by x′i, i = 1, . . . ,N.
This is because they are likely associated with high attributes of X denoted by xi,
i = 1, . . . ,N. The same thing goes when we consider a direct linear relationship

281


Pierpaolo Angelini

between Y ′ and Y as well as between Z′ and Z. If X and X′ are approximately
proportional then the first-order inclusion probabilities chosen by the statistician
are given by

πi =
nx′i∑N
j=1 x

′
j

, (68)

where we have i = 1, . . . ,N. If it turns out to be πi > 1 for some unit of the
population under consideration then we have πi = 1 for all units of the pop-
ulation under consideration having i as a label and such that it turns out to be
nx′i ≥

∑N
j=1 x

′
j because x

′
i is high. We consider n > 1 within this context. The

statistician consequently chooses

πi = (n−nA)
x′i∑N
j=1
j /∈A

x′j
, (69)

where we have i = 1, . . . ,N, i /∈ A, concerning the remaining units of the pop-
ulation under consideration. The set of the units of the population under consid-
eration such that it turns out to be nx′i ≥

∑N
j=1 x

′
j is denoted by A, while their

number is denoted by nA. The same thing goes when we consider Y ′ and Y as
well as Z′ and Z. Having said that, we evidently establish a linear relationship
between p(s′i), i = 1, . . . ,k, and πi, i = 1, . . . ,N. If the statistician chooses p(s

′
i),

i = 1, . . . ,k, with
∑k

i=1 p(s
′
i) = 1, then it is possible to get πi, i = 1, . . . ,N, with∑N

i=1 πi = n. We write 

π1
π2
...
πN


 =

k∑
i=1

δ(s′i)p(s
′
i). (70)

He is consequently able to obtain πi > 0 for every i = 1, . . . ,N. Conversely, if the
statistician chooses πi, i = 1, . . . ,N, then it is possible to get p(s′i), i = 1, . . . ,k.
We observe that α-products and α-norms use p(s′i), i = 1, . . . ,k, as scalars. We
obtain different metric relationships by using α-norms whose scalars are p(s′i),
i = 1, . . . ,k. We note that π1, . . . ,πN are used into

B−1P(S) =



p(s′1)
p(s′2)

...
p(s′k)


 (71)

in order to obtain p(s′i), i = 1, . . . ,k, when we have k = N. We note that B is
a square matrix, while B−1 is its inverse. If we have k 6= N then we consider

282


The α-criterion of concordance applied to probability

a system of N linear equations with k unknowns, where π1, . . . ,πN are constant
terms. We evidently refer to

LB(Q) = B



p(s′1)
p(s′2)

...
p(s′k)


 =



π1
π2
...
πN


 = P(S). (72)

It is known that if the statistician chooses appropriate inclusion probabilities then
he is able to obtain a more efficient estimator of the population mean.

14 Conclusions
We have considered random quantities whose logically possible values are

all logically possible samples of a given size belonging to a given set. Every
logically possible sample belonging to a given set has a subjective probability of
being selected. We have obtained the first-order inclusion probabilities by means
of coherent previsions of univariate random quantities. We have defined bivariate
random quantities whose components are two univariate random quantities having
all logically possible samples of a given size as their logically possible values.
All univariate random quantities which we have defined are complementary to
the univariate Horvitz-Thompson estimator. It is linear and homogeneous like a
coherent prevision of a univariate random quantity whose logically possible values
are all logically possible samples of a given size belonging to a given set. A
univariate random quantity representing deviations as well as a univariate random
quantity representing variations are defined on the basis of a coherent prevision of
a given univariate random quantity. These random quantities are the same quantity
from a randomness point of view. We have identified a quadratic and linear metric
with regard to two univariate random quantities representing deviations. We have
used the α-criterion of concordance introduced by Gini in order to identify it.

References
Pierpaolo Angelini. A quadratic and linear metric characterizing the sampling de-

sign with fixed sample size considered from a geometric viewpoint. European
Scientific Journal, 16(15):1–19, 2020.

D. Basu. On sampling with and without replacement. Sankhya: The Indian Jour-
nal of Statistics, 20(3-4):287–294, 1958.

283


Pierpaolo Angelini

D. Basu. An essay on the logical foundations of survey sampling, part one. In
V. P. Godambe and D. A. Sprott, editors, Foundations of Statistical Inference.
Holt, Rinehart & Winston, Toronto, 1971.

L. Bondesson. Recursion formulas for inclusion probabilities of all orders for
conditional Poisson, Sampford, Pareto, and more general sampling designs. In
M. Carlson, H. Nyquist, and M. Villani, editors, Official statistics, methodology
and applications in honour of Daniel Thorburn. Brommatryck & Brolins AB,
Stoccolma, 2010.

G. Coletti, R. Scozzafava, and B. Vantaggi. Possibilistic and probabilistic logic
under coherence: default reasoning and system P. Mathematica Slovaca, 65(4):
863–890, 2015.

W. S. Connor. An exact formula for the probability that two specified sampling
units will occur in a sample drawn with unequal probabilities and without re-
placement. Journal of the American Statistical Association, 61:384–390, 1966.

P. L. Conti and D. Marella. Inference for quantiles of a finite population: asymp-
totic versus resampling results. Scandinavian Journal of Statistics, 42:545–561,
2015.

B. de Finetti. Probability: beware of falsifications! In H. E. Kyburg jr. and H. E.
Smokler, editors, Studies in subjective probability. R. E. Krieger Publishing
Company, Huntington, New York, 1980.

B. de Finetti. The role of “dutch books” and of “proper scoring rules”. The British
Journal of Psychology of Sciences, 32:55–56, 1981.

B. de Finetti. Probability: the different views and terminologies in a critical anal-
ysis. In L. J. Cohen, J. Łoś, H. Pfeiffer, and K.-P. Podewski, editors, Logic,
Methodology and Philosophy of Science VI, pages 391–394. North-Holland
Publishing Company, Amsterdam, 1982a.

B. de Finetti. The proper approach to probability. In G. Koch and F. Spizzichino,
editors, Exchangeability in Probability and Statistics. North-Holland Publish-
ing Company, Amsterdam, 1982b.

B. de Finetti. Probabilism: A critical essay on the theory of probability and on the
value of science. Erkenntnis, 31(2-3):169–223, 1989.

B. de Finetti. La probabilità e la statistica nei rapporti con l’induzione, secondo
i diversi punti di vista. In B. de Finetti, editor, Induzione e statistica, pages
5–115. Springer, Heidelberg, 2011.

284


The α-criterion of concordance applied to probability

J.-C. Deville and Y. Tillé. Unequal probability sampling without replacement
through a splitting method. Biometrika, 85:89–101, 1998.

A. Gilio and G. Sanfilippo. Conditional random quantities and compounds of
conditionals. Studia logica, 102(4):709–729, 2014.

V. P. Godambe. A unified theory of sampling from finite populations. Journal of
the Royal Statistical Society, B17(2):269–278, 1955.

V. P. Godambe and V. M. Joshi. Admissibility and bayes estimation in sampling
finite populations. i. The Annals of Mathematical Statistics, 36(6):1707–1722,
1965.

I. J. Good. Subjective probability as the measure of a non-measurable set. In
E. Nagel, P. Suppes, and A. Tarski, editors, Logic, Methodology and Philosophy
of Science. Stanford University Press, Stanford, 1962.

J. Hájek. Some contributions to the theory of probability sampling. Bulletin of the
international Statistical Institute, 36(3):127–134, 1958.

H. O. Hartley and J. N. K. Rao. Sampling with unequal probabilities and without
replacement. The Annals of Mathematical Statistics, 33(2):350–374, 1962.

D. G. Horvitz and D. J. Thompson. A generalization of sampling without replace-
ment from a finite universe. Journal of the American Statistical Association, 47
(260):663–685, 1952.

V. M. Joshi. A note on admissible sampling designs for a finite population. The
Annals of Mathematical Statistics, 42(4):1425–1428, 1971.

Y. Tillé and M. Wilhelm. Probability sampling designs: principles for choice of
design and balancing. Statistical Science, 32(2):176–189, 2017.

F. Yates and P. M. Grundy. Selection without replacement from within strata with
probability proportional to size. Journal of the Royal Statistical Society B, 15
(2):253–261, 1953.

285