ISSN 2148-838Xhttp://dx.doi.org/10.13069/jacodesmath.466634

J. Algebra Comb. Discrete Appl.
5(3) • 153–165

Received: 24 October 2017
Accepted: 5 September 2018

Journal of Algebra Combinatorics Discrete Structures and Applications

Batch codes from Hamming and Reed-Muller codes∗

Research Article

Travis Baumbaugh, Yariana Diaz, Sophia Friesenhahn, Felice Manganiello,
Alexander Vetter

Abstract: Batch codes, introduced by Ishai et al., encode a string x ∈ Σk into an m-tuple of strings, called
buckets. In this paper we consider multiset batch codes wherein a set of t-users wish to access one
bit of information each from the original string. We introduce a concept of optimal batch codes. We
first show that binary Hamming codes are optimal batch codes. The main body of this work provides
batch properties of Reed-Muller codes. We look at locality and availability properties of first order
Reed-Muller codes over any finite field. We then show that binary first order Reed-Muller codes are
optimal batch codes when the number of users is 4 and generalize our study to the family of binary
Reed-Muller codes which have order less than half their length.

2010 MSC: 14G50

Keywords: Batch codes, Hamming codes, Reed-Muller codes

1. Introduction

Consider the situation where a certain amount of data, such as information to be downloaded, is
distributed over a number of devices. We could have multiple users who wish to download this data.
In order to reduce wait time, we look at locally repairable codes with availability as noted in [4]. A
locally repairable code, with locality r and availability δ, provides us the opportunity to reconstruct a
particular bit of data using δ disjoint sets of size at most r [12]. When we want to reconstruct a certain
bit of information, this is the same as a Private Information Retrieval (PIR) code. However, we wish to
examine a scenario where we reconstruct not necessarily distinct bits of information.

A possible answer to the more complex scenario seems to be a family of codes called batch codes,
introduced by Ishai et al. in [6]. Batch codes were originally studied as a schematic for distributing data

∗ This work was supported by NSF Grant DMS #1547399.
Travis Baumbaugh (Corresponding Author), Felice Manganiello; Clemson University, United States (email:

tbaumba@g.clemson.edu, manganm@clemson.edu).
Yariana Diaz; The University of Iowa, United States (email: yariana-diaz@uiowa.edu).
Sophia Friesenhahn; Willamette University, United States (email: sophmf@gmail.com).
Alexander Vetter; Villanova University, United States (email: avetter@villanova.edu).

153

https://orcid.org/0000-0002-3539-9758
https://orcid.org/0000-0003-0603-9750
https://orcid.org/0000-0001-8847-8962
https://orcid.org/0000-0002-5764-569X
https://orcid.org/0000-0003-2539-9815


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

across multiple devices and minimizing the load on each device and total amount of storage. We study
(n,k,t,m,τ) batch codes, where n is the code length, k is the dimension of the code, t is the number of
bits we wish to retrieve, m is the number of buckets, and τ is the maximum number of bits used from
each bucket for any reconstruction of t bits. In this paper we seek to minimize the number of devices in
the system and the load on each device while maximizing the amount of reconstructed data. In other
words, we want to minimize mτ while maximizing t.

In Section 2, we formally introduce batch codes and summarize results from previous work on batch
codes. We then introduce the concepts of locality and availability of a code. We conclude the section by
introducing a concept of optimal batch codes.

After the background, we study the batch properties of binary Hamming codes and Reed-Muller
codes. Section 3 focuses on batch properties of binary Hamming codes. We show that Hamming codes
are optimal (2s−1, 2s − 1 −s, 2,m,τ) batch codes for m,τ ∈ N such that mτ = 2s−1.

Section 4 is the main body of this work and provides batch properties of Reed-Muller codes. We
first study the induced batch properties of a code C given that C⊥ is of a (u | u + v)-code construction
with determined batch properties.

In Section 4.1 we study the locality and availability properties of first order Reed-Muller codes over
any finite field. We find that the locality of RMq(1,µ) is 2 when q 6= 2 and 3 when the q = 2. Furthermore,
we also show that its availability is

⌊
qµ−1

2

⌋
when q 6= 2, whereas when q = 2, the availability is 2

µ−1
3

if

µ is even and at least 2
µ−4
4

otherwise. In Section 4.2 we show that binary first order Reed-Muller codes
are optimal batch codes for t = 4. We first look at the specific RM(1, 4) case and achieve parameters
(16, 5, 4,m,τ) such that mτ = 10. We then prove a general result that any Reed-Muller code with ρ = 1
and µ ≥ 4 has batch properties (2µ,µ + 1, 4,m,τ) for any m,τ ∈ N such that mτ = 10.

Finally, in Section 4.3 we generalize our study of Reed-Muller codes and look at properties of
RM(ρ,µ) for all values of ρ and conclude our study by presenting batch properties (2µ,k, 4,m,τ) such
that mτ = 10 · 22ρ−2 for RM(ρ′,µ) where µ ∈{2ρ + 2, 2ρ + 3} and ρ′ ≤ ρ.

2. Background

In 2004, Ishai et al. [6] introduced the following definition of batch codes:

Definition 2.1. An (n,k,t,m,τ) batch code over an alphabet Σ encodes a string x ∈ Σk into an m-tuple
of strings, called buckets, of total length n such that for each t-tuple of distinct indices i1, . . . , it ∈ [k] =
{1, ...,k}, the entries xi1, . . . ,xit can be decoded by reading at most τ symbols from each bucket.

We can view the buckets as servers and the symbols used from each bucket as the maximal load
on each server. In the above scenario, a single user is trying to reconstruct t bits of information. This
definition naturally leads to the concept of multiset batch codes which have nearly the same definition
as above, but the indices i1, . . . , ik ∈ [k] are not necessarily distinct. This means we have t users who
each wish to reconstruct a single element. This definition in turn relates to private information retrieval
(PIR) codes which are similar to batch codes but instead look to reconstruct the same bit of information
t times. Another notable type of batch code defined in [6] is a primitive multiset batch code where the
number of buckets is m = n. In this research, the queries are considered to happen at the same time,
while the asynchronous case is considered in [10].

The following are useful lemmas proven in [6]:

Lemma 2.2. An (n,k,t,m,τ) batch code for any τ implies an (nτ,k,t,mτ, 1) batch code.

Lemma 2.3. An (n,k,t,m, 1) batch code implies an (n,k,t,dm
τ
e,τ) batch code.

Much of the related research involves primitive multiset batch codes with a systematic generator
matrix. In [6], the authors give results for some multiset batch codes using subcube codes and Reed-
Muller codes. They use a systematic generator matrix, which often allows for better parameters. Their

154


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

goal was to maximize the efficiency of the code for a fixed number of queries t. The focus of research
on batch codes then shifted to combinatorial batch codes. These were first introduced by [9]. They are
replication based codes using various combinatorial objects that allow for efficient decoding procedures.
We do not consider combinatorial batch codes but some relevant results can be found in [9], [2], [3], and
[11].

Next, the focus of research turned to linear batch codes, which use classical error-correcting codes.
The following useful results are proven in [7]:

Theorem 2.4. Let C be an [n,k,t,n, 1] linear batch code over F2 with generator G. Then, G is a
generator matrix of the classical error-correcting [n,k,d]2 linear code where d ≥ t.

Theorem 2.5. Let C1 be an [n1,k,t1,n1, 1]q linear batch code and C2 be an [n2,k,t2,n2, 1]q linear batch
code. Then, there exists an [n1 + n2,k,t1 + t2,n1 + n2, 1]q linear batch code.

Theorem 2.6. Let C1 be an [n1,k1, t1,n1, 1]q linear batch code and C2 be an [n2,k2, t2,n2, 1]q linear batch
code. Then, there exists an [n1 + n2,k1 + k2,min(t1, t2),n1 + n2, 1]q linear batch code.

Because of the vast amount of information on classical error-correcting codes, we use these as our
central focus in this paper. As is often the case, studying the properties of the dual codes helps us find
efficient batch codes.

Next, [13] considers restricting the size of reconstruction sets. These are similar to codes with locality
and availability:

Definition 2.7. A code C has locality r ≥ 1 if for any c ∈C, any entry in c can be recovered by using at
most r other entries of c.

Definition 2.8. A code C with locality r ≥ 1 has availability δ ≥ 1 if for any c ∈ C, any entry of c can
be recovered by using one of δ disjoint sets of at most r other entries of y

The restriction on the size of reconstruction sets can be viewed as trying to minimize total data
distribution. We restrict the size of our reconstruction sets to the locality of the code. By using this
restriction, we find multiset batch codes with optimal data distribution. For cyclic codes, the locality
can be derived from the following in [5]:

Lemma 2.9. Let C be an [n,k,d] cyclic code, and let d′ be the minimum distance of its dual code C⊥.
Then, the code C has all symbol locality d′ − 1.

This relies on each entry being in the support of a minimal weight dual code word. We generalize
this lemma to the following one.

Lemma 2.10. Let C ⊆ Fnq be a linear code and let d′ be the minimum distance of C⊥. If C⊥ is generated
by its minimum weight codewords and ⋃

λ∈C⊥
supp(λ) = [n], (1)

then C has all symbol locality d′ − 1.

Proof. Condition (1) implies that no coordinate of C is independent on the others. If the minimum
weight codewords generate C⊥, then each coordinate of C is in the support of at least one minimum
weight codeword of C⊥. This implies the all symbol locality d′ − 1 of C.

Condition (1) is a reasonable condition for a code. Without it the code C would have non recoverable
coordinates.

We give here a bound that relates the locality property of a linear code with its batch properties.

155


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

Lemma 2.11. Let C be an [n,k,t,m,τ] linear batch code with minimal locality r. Then it holds that

mτ ≥ (t− 1)r + 1. (2)

Proof. We consider such a code C. If each entry has at least one reconstruction set with fewer than
r elements, then by the definition of locality, C has locality less than r, a contradiction to r being the
minimal locality. Therefore, there exists at least one entry for which all recovery sets are of size at least
r. If we wish to recover this entry t times, then we may read the entry itself and then make use of t− 1
disjoint recovery sets, each of size r. This implies reading at least (t− 1)r + 1 entries, and since we may
read only τ entries from each of the m buckets, we must have that mτ ≥ (t− 1)r + 1.

From the perspective of individual devices storing bits of data, mτ represents the total amount of
data read to provide t pieces of the original data. To minimize bandwidth usage in the case where the
entries of a codeword represent nodes on a network, we must minimize mτ.

Definition 2.12. A [n,k,t,m,τ] linear batch code C with minimal locality r is optimal if it satisfies
Condition (2) with equality.

We now show that binary Hamming codes are optimal linear batch codes.

3. Hamming codes

Hamming codes were first introduced in 1950 by Richard Hamming. In what follows, we consider
binary Hamming codes over F2. The parameters of binary Hamming codes are shown in [8].

Definition 3.1. For some s ≥ 2, let H ∈ F2
s−1×s

2 be a matrix whose columns are all of the nonzero
vectors of Fs2. Let n = 2

s − 1. We use H as our parity check matrix and define the binary Hamming
Code:

Hs := {c ∈ Fn2 | cH
T = 0}

It is well-known that Hs is a [2s − 1, 2s − 1 −s, 3] cyclic code. Its dual code, the simplex code, is a
[2s − 1,s, 2s−1] cyclic code. Thus, by Lemmma 2.9, the locality of Hs is 2s−1 − 1. We now present the
batch properties of binary Hamming Codes.

Theorem 3.2. A binary [n = 2s − 1,k = 2s − 1 − s] Hamming code is an optimal batch code with
properties [2s − 1, 2s − 1 −s, 2,m,τ], for any m,τ ∈ N such that mτ = 2s−1.

Proof. First, note that mτ ≥ (2 − 1)(2s−1 − 1) + 1 = 2s−1. The buckets for m = 2s−1, τ = 1 are
constructed as follows. Let H be the parity check matrix of a binary Hamming code, H, with columns
hj ∈ Fs2 for 1 ≤ j ≤ n. If ha + hb = 1 (the all ones column), then we place a and b into the same bucket.
Note that because h` = 1 in H, ` is placed into its own bucket. Let rd ∈ Fn2 be the rows in H such
that 1 ≤ d ≤ (n − k) = s. For any c ∈ H, rd · cT = 0, and thus

∑
i∈supp(rd) ci = 0. As a result of this

construction, if a,b ∈ supp(rd), then entry d of ha + hb is 0, so a and b cannot be in the same bucket.
Therefore,

∑
i∈supp(rd) ci = 0 only involves bits in separate buckets. Hence, any bit in a codeword can be

written as a linear combination of bits in separate buckets. Now, we show how to reconstruct any two
bits of information.

• Case 1: If a and b are in separate buckets, then use ca and cb.

• Case 2: Suppose a and b are in the same bucket. We can take ca itself. To construct cb, we choose
an rd such that b ∈ supp(rd). Then, we can write cb =

∑
i∈supp(rd)\{b} ci, which we know only

contains bits in disjoint buckets.

156


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

Every bucket has cardinality 2 aside from the bucket corresponding to the all ones column in H, so
this construction gives us exactly 2s−1 buckets. Thus, we have shown that the batch properties hold for
m = 2s−1 and τ = 1. Further, Lemma 2.3 implies that this is true for any m,τ such that mτ = 2s−1.

Note that the locality of H is 2s−1 − 1, and therefore, t = 2 is also maximal. Suppose instead that
we could have t ≥ 3. Then in particular, each entry must be reconstructible at least 3 times. We may
take the entry itself, but then there must be at least 2 other reconstruction sets used which are disjoint
and of size 2s−1 − 1. These would correspond to two codewords in the dual code of weight 2s−1 with
the intersection of their support being only the given entry. The sum of these codewords will thus have
weight 2s−1 + 2s−1 −2 = 2s−2. However, the all ones vector is also in the dual code. Adding this vector
to the sum will produce a codeword of weight one, a contradiction. Thus, t = 2 is maximal.

Example 3.3. We now give an example for s = 3. This Hamming Code is a [7, 4]-linear code, and the
dual code is a [7, 3]-linear code.

The parity check matrix H is as follows:

H =


1 0 1 0 1 0 10 1 1 0 0 1 1

0 0 0 1 1 1 1




Thus, the buckets are:

{1, 6},{2, 5},{3, 4},{7}

We note that for general batch codes we are only interested in reconstructing information bits.
However, we are able to obtain any pair of bits in the codeword. Additionally, we note that although
mτ is optimized, we wish to find batch codes where t > 2. We desire a larger value as we are concerned
with practical applications, and the goal is to quickly distribute data. Thus we move on to Reed-Muller
codes, where we are able to obtain larger t values.

4. Reed-Muller codes

Reed-Muller codes are well known linear codes. We give some basic properties of these codes, but
an interested reader can find more information in [1].

Definition 4.1. Let Fq[X1, . . . ,Xµ] be the ring of polynomials in µ variables with coefficients in Fq and
let Fµq = {P1, . . . ,Pn} (so n = qµ). The q-ary Reed-Muller code, RMq(ρ,µ) is defined as:

RMq(ρ,µ) := {(f(P1), . . . ,f(Pn)) | f ∈ Fq[X1, . . . ,Xµ]ρ},

where Fq[X1, . . . ,Xµ]ρ is the set of all multivariate polynomials over Fq of total degree at most ρ.

It is known that if ρ < µ(q − 1) then the dual of a Reed-Muller code RMq(ρ,µ) is the Reed-Muller
code RMq(µ(q − 1) − 1 −ρ,µ) [1].

In the binary case, Reed-Muller codes can be equivalently defined using the (u | u+v)-code construc-
tion. For completeness, we first give a description of the (u | u + v)-code construction and the related
generator matrix, which can be found in [8].

Definition 4.2. Given two linear codes C1,C2 with identical alphabets and block lengths, we construct a
new code C as follows:

C := {(u | u + v) | u ∈C1, v ∈C2}.

157


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

Let G, G1, and G2 be the generator matrices for the codes C, C1, and C2, respectively, where C is
obtained from C1 and C2 via the (u | u + v)-construction. Then we have

G :=

(
G1 G1
0 G2

)

From this, we have the following proposition.

Proposition 4.3. Let C1 be an [n,k1,d1]-linear code, C2 an [n,k2,d2]-linear code, and C the code obtained
from C1,C2 via the (u | u + v)-construction. Then, C is an [2n,k,d]-code where k = k1 + k2 and d =
min{2d1,d2}.

Later, our focus will be on q = 2, so when referring to RM2(ρ,µ) we omit the 2 for convenience.
We obtain the following equivalent definition of a binary Reed-Muller code.

Definition 4.4. Let ρ < µ. A binary Reed-Muller code RM(ρ,µ) is defined as follows:

RM(ρ,µ) := {(u | u + v) | u ∈RM(ρ,µ− 1), v ∈RM(ρ− 1,µ− 1)}

where RM(0,µ) := 1 of length 2µ, and RM(µ,µ) := I2µ.

As a consequence if Gρ,µ is the generator matrix of the code RM(ρ,µ), then

Gρ,µ :=

(
Gρ,µ−1 Gρ,µ−1

0 Gρ−1,µ−1

)
We now examine the batch properties of the (u | v + u)-code construction. The first notable result

comes from codes that are contained in other codes:

Theorem 4.5. Let C1,C2 be codes of length n and dimension k1 and k2, respectively such that C2 ⊆C1.
If C1 is a (n,k1, t,m,τ) batch code, then C2 is a (n,k2, t,m,τ) batch code.

Proof. Note that C⊥1 ⊆ C⊥2 because C2 ⊆ C1. Therefore the same parity check equations for C1 apply
to C2. Thus, to recover information in C2, we may use the same parity check equations we would in C1,
which implies C2 is at least a (n,k2, t,m,τ) batch code.

We now introduce results for a (u | u + v)-code construction.

Theorem 4.6. Let n,k1,k2 ∈ N such that n ≥ k1 ≥ k2, and let C be a [n,k1 + k2] linear code. Then let
C⊥ be a (u | u + v)-code construction of C⊥1 and C⊥2 , where

• C1⊥ is an [n,n−k1] linear code and C2⊥ is an [n,n−k2] linear code, and

• C⊥2 ⊆C⊥1 .

If C2 is a (n,k,t,mτ) batch code, then C is a (2n,k1 + k2, t,m,τ) batch code.

Proof. The first two parameters of C follow from the definition of a (u | u + v) construction. Let C be
constructed as described, and let C2 be an (n,k,t,m,τ) batch code. Then consider any t-tuple of indices
i1, . . . , it ∈ [2n] and let i′j = ij if ij ∈ [n] and i

′
j = ij −n otherwise. Then we know that i

′
1, . . . , i

′
t ∈ [n],

and so by the batch properties of C2 there exist t disjoint recovery sets for the entries in these indices,
the union of which consists of at most τ entries in each of the m buckets.

If for all i ∈ {n + 1, . . . , 2n}, we place i in the same bucket as i−n, then this results in m buckets
for C. If we consider the recovery sets from above, if Ri′

j
is the recovery set for i′j, and ij ∈ [n], then we

claim that R′
i′
j

= Ri′
j
is a recovery set for ij in C. This is because the recovery set comes from a vector

158


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

v ∈ C⊥2 , and by construction, since C⊥2 ⊆ C⊥1 , we have that (v|0), (0|v) ∈ C⊥. Likewise, if ij /∈ [n], then
by using (0|v), we find that R′

i′
j

= {i + n|i ∈ Ri′
j
} is a recovery set for ij.

Since the original Ri′
j
are all disjoint, so are the R′

i′
j
, and so we have t disjoint recovery sets, the

union of which consists of at most τ elements from each of m buckets, and so C is a (2n,k1 + k2, t,m,τ)
batch code.

Because binary Reed-Muller codes have parity check matrices that satisfy the above properties, we
turn to that family of codes.

4.1. Locality and availability properties of RMq(1,µ)

Reed-Muller codes for which ρ = 1 are known as first order Reed-Muller codes. We look at the
properties using the polynomial evaluation definition of Reed-Muller codes. We begin with a result in
the q-ary case.

Theorem 4.7. Let Fµq = {Pi | 1 ≤ i ≤ 2µ = n} be the set of evaluation points for RMq(1,µ). Then
(λ1, . . . ,λn) ∈RMq(1,µ)⊥ if and only if

n∑
i=1

λiPi = 0 and
n∑
i=1

λi = 0. (3)

Proof. First, if (λ1, . . . ,λn) is in the dual code, then by definition,

n∑
i=1

λif(Pi) = 0 (4)

for every polynomial f ∈ Fq[x1, . . . ,xµ]1. In particular, note that for any 1 ≤ k ≤ µ, from fk(x1, . . . ,xµ) =
xk, we have

n∑
i=1

λifk(Pi) =

n∑
i=1

λipi,k = 0,

where pi,k is the kth entry of point Pi. We may gather these equations together for 1 ≤ k ≤ µ to write
the linear combination

n∑
i=1

λiPi = 0.

We then consider f0 = 1, and Equation (4) becomes
∑n
i=1 λi = 0, and so we have this direction.

For the other direction, assume that

n∑
i=1

λiPi = 0 and
n∑
i=1

λi = 0.

Then in particular, for any 1 ≤ k ≤ µ, we have
∑n
i=1 λipi,k = 0, and so for any f ∈ Fq[x1, . . . ,xµ]

1, by
linearity we have that

n∑
i=1

λif(Pi) =

n∑
i=1

λi

[
f0 +

µ∑
k=1

fkpi,k

]
= f0

n∑
i=1

λi +

µ∑
k=1

fk

n∑
i=1

λipi,k = 0,

and thus (λ1, . . . ,λµ) ∈RM(1,µ)⊥.

159


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

From Theorem 4.7 we obtain the following corollaries:

Corollary 4.8. The minimum distance of RMq(1,µ)⊥ is 4 if q = 2 and 3 otherwise.

Proof. Let q = 2 and suppose by way of contradiction that the minimum weight is 2. Then there exist
two distinct points that sum to zero. This is not possible, and thus the minimum weight must be greater
than 2. Note that the only choice of λi is 1, and thus the sum is 0 if and only if there are an even number
of points. Therefore, the weight of the codewords is a multiple of 2, and thus the minimum weight is not
3. The following points are in P (for µ ≥ 2):

P0 = (0, 0, 0, . . . , 0)
T ,P1 = (1, 0, 0, . . . , 0)

T ,P2 = (0, 1, 0, . . . , 0)
T , and P3 = P1 + P2.

These points satisfy the conditions, and thus the minimum distance for characteristic 2 is 4.

If q 6= 2, let P1 = (0, 0, 0, 0, . . . , 0)T ,P2 = (1, 0, 0, 0, . . . , 0)T ,P3 = (−a, 0, 0, 0, . . . , 0)T ∈ Fµq and the
entries of λ be −(a + 1),a, and 1 corresponding to the positions of P1,P2, and P3, respectively, and 0
otherwise. Then, If a 6= −1, 0, λ satisfies Equations (3).

Suppose there exists a λ ∈RMq(1,µ)⊥ with weight 2. Then we have two distinct points Pi,Pj ∈ Fµq
and λi,λj ∈ Fq such that λj = −λi and λk = 0 for all k 6= i,j. Our two conditions imply:

λiPi −λiPj = 0 =⇒ Pi = Pj,

A contradiction to the two points being distinct.

Corollary 4.9. When q = 2, every codeword in RM(ρ,µ) satisfies Equation 3 for ρ ≤ µ− 2.

Proof. The dual code of RM(1,µ) is RM(µ− 2,µ) and RM(ρ1,µ) ⊂ RM(ρ2,µ) if ρ1 < ρ2. Thus,
any codeword in RM(ρ,µ) is also in RM(µ− 2,µ). Therefore, Theorem 4.7 implies our claim.

Theorem 4.10. Let q 6= 2. Then RMq(1,µ) has locality 2 and availability δ =
⌊
qµ−1

2

⌋
.

Proof. Let Pa ∈ Fµq be an evaluation point. Then consider any α ∈ Fq such that α 6= 0,−1. We have
that 1 + α + (−α−1) = 0, and will find corresponding points to use in the reconstruction of Pa. For any
choice of Pb ∈ Fµq , Pb 6= Pa, take

Pc = (α + 1)
−1(Pa + αPb).

Upon rearrangement, we have that Pa + αPb + (−α− 1)Pc = 0. Furthermore, we find that Pc 6= Pa,Pb.
If Pc = Pa, then our equation becomes Pa + αPb + (−α− 1)Pa = 0, which simplifies to αPb −αPa = 0,
which would contradict Pb 6= Pa. Likewise, Pc = Pb would imply Pa + αPb + (−α − 1)Pb = 0, which
becomes Pa −Pb = 0, another contradiction. A simple counting argument tells us that there are

⌊
qµ−1

2

⌋
choices of pairs Pb,Pc for Pa, and each of these corresponds to a unique λ ∈ RMq(1,µ)⊥ of weight 3
that can be used to recover ca, the supports of which intersect only in {a}. Thus, the locality is 2 and
the availability is

⌊
qµ−1

2

⌋
.

As proven in [1] the minimum-weight codewords are generators of a Reed-Muller code RMps(µ,ρ)
where p is a prime number and 0 ≤ ρ ≤ µ(ps − 1) if and only if either m = 1 or µ = 1 or ρ < p or
ρ > (µ− 1)(ps − 1) + ps−1 − 2. Together with Lemma 2.10 it follows that these Reed-Muller codes have
all symbol availability. Thus, in the following theorems, we only consider the availability of P1 as this
implies all symbol availability.

Theorem 4.11. RM(1,µ) has availability δ = 2
µ−1
3

when µ is even.

160


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

Proof. An inductive argument on µ satisfies this claim. We look at the sum of evaluation points
to prove our claim. We are looking for 2

µ−1
3

disjoint sets of three points of Fµq that sum to
(0, . . . , 0)T ∈ Fµq .

It is easy to verify the claim for µ = 2 since there is only one equation for which this is true

(0, 1)T + (1, 0)T + (1, 1)T = (0, 0)T .

Now assume the claim is true for µ = 2k, we show that it is true also for µ = 2k + 2. We have 2
2k−1
3

disjoint sets of three points that all sum to P̄1 = (0, . . . , 0)T ∈ F2kq . Let P1 = (0, . . . , 0)T ∈ F2k+2q . For
any choice of set of points {S1,S2,S3}⊂ F2kq that sum to P̄1 in F2kq it holds that

(ST1 |0, 0)
T + (ST2 |0, 0)

T + (ST3 |0, 0)
T = P1

(ST1 |1, 0)
T + (ST2 |0, 1)

T + (ST3 |1, 1)
T = P1

(ST1 |0, 1)
T + (ST2 |1, 1)

T + (ST3 |1, 0)
T = P1

(ST1 |1, 1)
T + (ST2 |1, 0)

T + (ST3 |0, 1)
T = P1.

(5)

Additionally it also holds that

(P̄T1 |1, 1)
T + (P̄T1 |1, 0)

T + (P̄T1 |0, 1)
T = PT1 . (6)

The four equations in (5) all use distinct sets of points because S1,S2, and S3 are distinct. Now, there
are 2

2k−1
3

sets of distinct points like S1,S2, and S3. Thus, we have a total of

4 ·
22k − 1

3
+ 1 =

22k+2 − 1
3

.

Note that the extra one comes from Equation (6). Also note that our total is an integer as

22k+2 ≡ 4k+1 ≡ 1k+1 ≡ 1 (mod 3).

Because of this, every single coordinate is used, and thus we have maximal availability.

Theorem 4.12. RM(1,µ) has availability δ at least 2
µ−4
4

when µ is odd.

Proof. We prove this by induction on µ. For µ = 3, let S1 = (1, 0, 0)T , S2 = (0, 1, 0)T and
S3 = S1 + S2, then

S1 + S2 + S3 = P̄1 = (0, 0, 0)
T .

No combination of the four remaining points of F32 sum to P̄1. Here we have availability 1 =
23−4

4
, and

so we have our base case.

Now assume that, for µ = 2k + 1, where k ≥ 1, we have that the availability of RM(1,µ) is at least
2µ−4

4
, and there are at least 3 points that are not used in any recovery set for P1. Let S1,S2, and S3 be

any three points in a recovery set of P1 ∈ F
µ
2 . Then for µ + 2, the disjoint sets of three points that sum

to P̃1 = (0, . . . , 0) ∈ F
µ+5
2 can be defined by the equations:

(ST1 |0, 0)
T + (ST2 |0, 0)

T + (ST3 |0, 0)
T = P̃1

(ST1 |1, 0)
T + (ST2 |0, 1)

T + (ST3 |1, 1)
T = P̃1

(ST1 |0, 1)
T + (ST2 |1, 1)

T + (ST3 |1, 0)
T = P̃1

(ST1 |1, 1)
T + (ST2 |1, 0)

T + (ST3 |0, 1)
T = P̃1

161


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

This results in at least 4
(
2µ−4

4

)
= 2µ−4 possible recovery sets. However, we also have points T1,T2,

and T3 that are not used in F
µ
2 , and so we may also define the following equations:

(P̄T1 |1, 0)
T + (TT1 |0, 1)

T + (TT1 |1, 1)
T = P̃1

(P̄T1 |0, 1)
T + (TT2 |1, 1)

T + (TT2 |1, 0)
T = P̃1

(P̄T1 |1, 1)
T + (TT3 |1, 0)

T + (TT3 |0, 1)
T = P̃1

Adding these, we have availability of at least 2µ − 4 + 3 = 2µ − 1 = 2
µ+2−4

4
, and so our property

holds for µ + 2 as well. We also note that 3(2µ −1) + 3 = 3 ·2µ < 4 ·2µ = 2µ+2, and so there are at least
3 unused points. Thus, by induction, we have that for every k ≥ 1, when µ = 2k + 1, the availability of
RM(1,µ) is at least 2

µ−4
4

Thus we have achieved a lower bound on δ. Note, however, that we have not shown that this is
necessarily an optimal construction.

We now study the batch properties of Reed-Muller codes.

4.2. Batch properties of RM(1,µ)

Theorem 4.13. The linear code RM(1, 4) is a (16, 5, 4,m,τ) batch code for any m,τ ∈ N such that
mτ = 10.

Proof. First, note that the dual code of RM(1, 4) is RM(2, 4), which informs us how to reconstruct
elements of the codewords. The generator matrix for RM(1, 4) can be recursively constructed as follows:

G1,4 =

(
G1,3 G1,3

0 G0,3

)
It can be verified that any query of 4 coordinates of a codeword in RM(1, 4) is possible with the

following partition into buckets:

{1}, {2}, {3}, {4}, {5, 6}, {7, 8}, {9, 11}, {10, 12}, {13, 16}, {14, 15}.

In the above case, m = 10 and τ = 1. By Lemma 2.3, this holds for any m,τ ∈ N such that
mτ = 10.

We now show how to extend this construction to RM(1,µ) for any µ ≥ 4.

Theorem 4.14. Any first order Reed-Muller code, RM(1,µ), with µ ≥ 4, has batch properties
(n,k, 4,m,τ) for any m,τ ∈ N such that mτ = 10.

Proof. We will proceed by induction. First, we have just shown this for the base case where µ = 4.
Now, assume that for some µ − 1 ≥ 4, we have that RM(1,µ − 1) has batch properties (n,k, 4,m,τ).
Recall that the dual code of C = RM(1,µ) is C⊥ = RM(µ − 2,µ). Then as Reed-Muller codes follow
the (u | u + v)-construction, C⊥ is the (u | u + v)-code construction of C⊥1 = RM(µ − 2,µ − 1) and
C⊥2 = RM(µ−3,µ−1). Since RM(µ−3,µ−1) ⊆RM(µ−2,µ−1), we have C⊥2 ⊆C⊥1 . This means we
may apply Theorem 4.6. Since C2 = RM(1,µ− 1), we know that C is also a (n,k, 4,m,τ) batch code.
By induction, this is now true for any µ ≥ 4.

Since the locality of these codes is r = 3, for t = 4, we have mτ = 10 = (4−1) ·3 + 1 = (t−1)r + 1,
and thus we have optimal batch properties. We now extend this even further for most RM(ρ,µ).

162


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

4.3. Batch properties of RM(ρ,µ)

We begin with a preliminary result that uses the recursive construction of Reed-Muller binary codes.

Lemma 4.15. Let a ∈RM(ρ− 1,µ− 2). Then

(a|a|0|0), (a|0|a|0), (a|0|0|a), (0|a|a|0), (0|a|0|a), (0|0|a|a) ∈RM(ρ,µ),

where 0 ∈ Fµ2 .

Proof. Let

G =

(
Gρ,µ−1 Gρ,µ−1

0 Gρ−1,µ−1

)
be the generator of RM(ρ,µ). Recursively, we obtain that

G =



Gρ,µ−2 Gρ,µ−2 Gρ,µ−2 Gρ,µ−2

0 Gρ−1,µ−2 0 Gρ−1,µ−2
0 0 Gρ−1,µ−2 Gρ−1,µ−2
0 0 0 Gρ−2,µ−2


 (7)

From the second and third block rows in matrix (7), we see that for any a ∈ RM(ρ − 1,µ − 2), the
second row implies (0|a|0|a) ∈ RM(ρ,µ), and the third row implies (0|0|a|a) ∈ RM(ρ,µ). Note that
our code is linear, and thus (0|a|0|a) + (0|0|a|a) = (0|a|a|0) ∈ RM(ρ,µ). Finally, note that since
RM(ρ − 1,µ − 2) ⊆ RM(ρ,µ − 2), the first row implies (a|a|a|a) ∈ RM(ρ,µ), and so combining this
with the previous vectors, we find that (a|a|0|0), (a|0|a|0), (a|0|0|a) ∈RM(ρ,µ).

We now show the batch properties of RM(ρ,µ) for ρ ≥ 1 and µ ≥ 2ρ + 2.

Theorem 4.16. Let ρ ≥ 1 and µ ∈ {2ρ + 2, 2ρ + 3}. Then, for ρ′ ≤ ρ, RM(ρ′,µ) is a (n,k, 4,m,τ)
linear batch code for any m,τ ∈ N such that mτ = 10 · 22ρ−2.

Proof. We focus on the case where µ = 2ρ + 2 as the case µ = 2ρ + 3 proceeds with similar steps.

If RM(ρ, 2ρ + 2) is a (n,k, 4,m,τ) linear batch code for any m,τ ∈ N such that mτ = 10 · 22ρ−2,
then it follows from Theorem 4.5 that for any ρ′ ≤ ρ, the code RM(ρ′, 2ρ + 2) is a (n,k, 4,m,τ) batch
code for any m,τ ∈ N such that mτ = 10 · 22ρ−2. Thus, we need only prove that this property holds for
RM(ρ, 2ρ + 2).

We proceed by induction on ρ. Note that in Section 4.2, the claim is true for ρ = 1, the base cases
with µ = 4, 5. Assume that ρ ≥ 1. We show that the properties hold for RM(ρ + 1, 2(ρ + 1) + 2) =
RM(ρ + 1, 2ρ + 4), assuming that RM(ρ, 2ρ + 2) is a (n,k, 4,m,τ) linear batch code for any m,τ ∈ N
such that mτ = 10 · 22ρ−2. In particular, we may choose τ = 1 and have m = 10 · 22ρ−2 buckets.

We now examine RM(ρ + 1, 2ρ + 4). The dual code of RM(ρ + 1, 2ρ + 4) is RM(ρ + 2, 2ρ + 4). By
Lemma 4.15, for any a ∈RM(ρ + 1, 2ρ + 2) = RM(ρ, 2ρ + 2)⊥, we have

(a|a|0|0), (a|0|a|0), (a|0|0|a), (0|a|a|0), (0|a|0|a), (0|0|a|a) ∈RM(ρ + 2, 2ρ + 4).

This provides a way to produce parity check equations for RM(ρ+1, 2ρ+4) from those for RM(ρ, 2ρ+1),
which in turn provides a way to make recovery sets for the former from those for the latter, as each vector
corresponds to a recovery set for every index in its support.

Each bucket B = {i1, . . . , i`} for RM(ρ, 2ρ + 2) can be made into 4 buckets for RM(ρ + 1, 2ρ + 4):
B1 = B, B2 = B + n = {i1 + n,. . . , i` + n}, B3 = B + 2n, and B4 = B + 3n. This results in
4 · 10 · 22ρ−2 = 10 · 22ρ = 10 · 22(ρ+1)−2 buckets, and we must show that any set of 4 indices may be
recovered by drawing at most 1 entry from each bucket.

163


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

Consider any tuple of 4 indices i1, i2, i3, i4 ∈ [4n] =
⋃3
s=0([n] + sn) and let sk =

⌊
ik−1
n

⌋
for k ∈ [4].

Then let i′k = ik − skn, so that i
′
1, i
′
2, i
′
3, i
′
4 ∈ [n]. By the induction hypothesis, we have recovery sets

R′1,R
′
2,R

′
3,R

′
4 ⊆ [n] for these indices in RM(ρ, 2ρ + 2). These recovery sets are sets such that i′k ∈ R

′
k

for all k ∈ [4] and

1. (R′k \{i
′
k}) ∩ (Rj \{i

′
j}) = ∅ for k,j ∈ [4] with k 6= j

2.
⋃4
k=1(R

′
k \{i

′
k}) consists of at most 1 index in each bucket

Each R′k is either {i
′
k} or the support of some vector a ∈ RM(ρ + 1, 2ρ + 2). If |R

′
k| = 1, then let

Rk = R
′
k + skn. Otherwise, let Rk = (R

′
k + skn)∪(R

′
k + s

′
kn). By Lemma 4.15, we know that if s

′
k 6= sk,

Rk is the support of some vector in RM(ρ + 2, 2ρ + 4), and so this is a valid recovery set. We must now
show that these recovery sets have the desired properties given the correct choice of s′k values.

We now note that since indices are being recreated from d = |{s1,s2,s3,s4}| different quarters of
[4n], we can take at least d of the recovery sets to be singletons. Further, assume that we take as many
recovery sets to be singletons as possible. In particular, this means that no recovery set will contain more
than one index in each bucket. This is because the only way Rk could contain two indices in a bucket
would be if R′k did. Since R

′
k is a proper recovery set, it could only contain two indices in one bucket

if one of those indices was i′k. Then that bucket is not used in any other recovery set, and so we could
instead take R′k = {i

′
k}, as per our assumption.

This leaves at most 4 − d recovery sets which are not singletons and require a subset in a second
quarter. Assume without loss of generality that these are R1, . . . ,Rd−4. Then we may let s′1, . . . ,s

′
d−4

be the elements of {0, 1, 2, 3}\{s1,s2,s3,s4}. Since these are distinct, the only way some Rk \{ik} and
Rj \ {ij} could have a nonempty intersection would be if condition 1 was violated. Thus, condition 1
also holds for the Rk. We have already covered the fact that none of the R′k + s

′
kn will not contain more

than one index in each bucket, and since these are in separate quarters, the only way
⋃4
k=1(Rk \{ik})

would contain more than one index in a bucket would be if some elements being recovered in the same
quarter had (R′k \{i

′
k}) ∪ (R

′
j \{ij}) consisting of more than 1 index in some bucket. This would violate

condition 2, and so we know that the Rks also satisfy 2.

Thus, this code is a (4n,k′, 4, 10·22(ρ+1)−2, 1) batch code, and by Lemma 2.3, we know that RM(ρ+
1, 2(ρ + 1) + 2) is a (4n,k′, 4,m,τ) batch code for any m,τ ∈ N such that mτ = 10 · 22(ρ+1)−2. This
completes the induction step, and so for any ρ ≥ 1, RM(ρ, 2ρ + 2) is a (4n,k′, 4,m,τ) batch code for
any m,τ ∈ N such that mτ = 10 · 22ρ−2.

5. Conclusion

This work focuses on batch properties of binary Hamming and Reed-Muller code.

The high locality of binary Hamming codes implies their availability to be at most 1. Binary
Hamming codes can be viewed as linear batch codes retrieving queries of at most 2 indices, the trivial case.
Nonetheless, we prove that for t = 2, binary Hamming codes are actually optimal (2s−1, 2s−1−s, 2,m,τ)
batch codes for m,τ ∈ N such that mτ = 2s−1.

We turn to binary Reed-Muller codes for optimal batch codes that allow larger queries, meaning
t-tuples with t larger than 2. This research direction is motivated by the large availability of first order
Reed-Muller codes as showed in the paper. We prove the optimality of first order Reed-Muller codes for
t = 4.

Finally we generalize our study to Reed-Muller codes RM(ρ,µ) which have order less than half their
length by proving that they have batch properties (2µ,k, 4,m,τ) such that mτ = 10·22ρ−2 for RM(ρ′,µ)
where µ ∈{2ρ + 2, 2ρ + 3} and ρ′ ≤ ρ.

Acknowledgment: The authors are grateful to Clemson University for hosting the REU at which
this work was completed. The REU was made possible by an NSF Research Training Group (RTG) grant

164


T. Baumbaugh et al. / J. Algebra Comb. Discrete Appl. 5(3) (2018) 153–165

(DMS #1547399) promoting Coding Theory, Cryptography, and Number Theory at Clemson.

References

[1] E. F. Assmus, J. D. Key, Designs and Their Codes, volume 103 of Cambridge Tracts in Mathematics,
Cambridge University Press, Cambridge, 1992.

[2] S. Bhattacharya, S. Ruj, B. Roy, Combinatorial batch codes: A lower bound and optimal construc-
tions, Adv. Math. Commun. 6(2) (2012) 165–174.

[3] C. Bujtás, Z. Tuza, Optimal combinatorial batch codes derived from dual systems, Miskolc Math.
Notes 12(1) (2011) 11–23.

[4] A. G. Dimakis, K. Ramchandran, Y. Wu, C. Suh, A survey on network codes for distributed storage,
Proc. IEEE 99(3) (2011) 476–489.

[5] P. Huang, E. Yaakobi, H. Uchikawa, P. H. Siegel, Binary linear locally repairable codes, IEEE Trans.
Inform. Theory 62(11) (2016) 6268–6283.

[6] Y. Ishai, E. Kushilevitz, R. Ostrovsky, A. Sahai, Batch codes and their applications, Proc. 36th
Annu. ACM Symp. Theory Comput. (2004) 262–271.

[7] H. Lipmaa, V. Skachek, Linear batch codes, In Coding Theory and Applications, CIM Series in
Mathematical Sciences 3 (2015) 245–253.

[8] F. J. MacWilliams, N. J. A. Sloane. The Theory of Error–correcting Codes, North–Holland Publishing
Co., Amsterdam–New York–Oxford, 1977.

[9] M. B. Paterson, D. R. Stinson, R. Wei, Combinatorial batch codes, Adv. Math. Commun. 3(1) (2009)
13–27.

[10] A.–E. Riet, V. Skachek, E. K. Thomas, Asynchronous batch and PIR codes from hypergraphs,
preprint, 2018, arXiv:1806.00592.

[11] N. Silberstein, A. Gál, Optimal combinatorial batch codes based on block designs, Des. Codes Cryp-
togr. 78(2) (2016) 409–424.

[12] V. Skachek, Batch and PIR codes and their connections to locally repairable codes, In Network
Coding and Subspace Designs, Signals Commun. Technol., Springer, Cham, (2018) 427–442.

[13] E. K. Thomas, V. Skachek, Constructions and bounds for batch codes with small parameters, In
Coding Theory and Applications, Springer, Cham, (2017) 283–295.

165

http://dx.doi.org/10.1017/CBO9781316529836
http://dx.doi.org/10.1017/CBO9781316529836
http://dx.doi.org/10.3934/amc.2012.6.165
http://dx.doi.org/10.3934/amc.2012.6.165
https://mathscinet.ams.org/mathscinet-getitem?mr=2856861
https://mathscinet.ams.org/mathscinet-getitem?mr=2856861
https://doi.org/10.1109/JPROC.2010.2096170
https://doi.org/10.1109/JPROC.2010.2096170
http://dx.doi.org/10.1109/TIT.2016.2605119
http://dx.doi.org/10.1109/TIT.2016.2605119
http://dx.doi.org/10.1145/1007352.1007396
http://dx.doi.org/10.1145/1007352.1007396
https://doi.org/10.1007/978-3-319-17296-5_26
https://doi.org/10.1007/978-3-319-17296-5_26
https://mathscinet.ams.org/mathscinet-getitem?mr=465509
https://mathscinet.ams.org/mathscinet-getitem?mr=465509
http://dx.doi.org/10.3934/amc.2009.3.13
http://dx.doi.org/10.3934/amc.2009.3.13
https://arxiv.org/abs/1806.00592
https://arxiv.org/abs/1806.00592
http://dx.doi.org/10.1007/s10623-014-0007-9
http://dx.doi.org/10.1007/s10623-014-0007-9
https://doi.org/10.1007/978-3-319-70293-3_16
https://doi.org/10.1007/978-3-319-70293-3_16
https://link.springer.com/chapter/10.1007/978-3-319-66278-7_24
https://link.springer.com/chapter/10.1007/978-3-319-66278-7_24

	Introduction
	Background
	Hamming codes
	Reed-Muller codes
	Conclusion
	References