Arweave: The Permanent Information Storage Protocol

Sam Williams
sam@arweave.org

Abhav Kedia
abhavkedia@gmail.com

Lev Berman
lev@arweave.org

Sebastian Campos-Groth
sebastian@arweave.org

DRAFT 17
December 26, 2023

Abstract

In this paper we describe Arweave: a protocol for the dis-
intermediated transmission of information across space and
time. Arweave aims to serve as an institution offering an
arbitrarily scalable permanent ledger of data and speech in
cyberspace, without centralized authority and control. The
network achieves this by utilizing a novel blockchain consen-
sus mechanism powered by succinct proofs of replicated stor-
age. This proof system has low verification overhead: min-
imizing compute and bandwidth requirements, while max-
imizing decentralization. This construction is paired with
a storage endowment that incentivizes data replication, al-
lowing the network to operate in a fully autonomous, trans-
parent and predictable fashion. Finally, we describe a new
system of fork-based evolution that allows for protocol flex-
ibility while preserving user rights over time.

1 Introduction

The rate of information flow in society has increased precip-
itously since the invention of the printing press, culminating
with the internet in the late 20th century. As human so-
ciety has progressed along this information distribution ef-
ficiency curve, the level to which individuals are informed
about the world around them has increased. This increase
has been productive for societies as it has fostered greater
transparency and awareness within communities. In its cur-
rent form, the internet is still practically subject to subver-
sion by centralized points of control, resulting in information
distribution inefficiencies: it can be “forgetful”, losing a cer-
tain proportion of its useful content every year, and these
centralized points of control and aggregation have the power
to arbitrarily censor and manipulate the flow of informa-
tion. Arweave is built to address these problems by allow-
ing for the disintermediated transfer of information across
extremely long distances of time and space. While further
discourse on the uses and potential implications of the Ar-
weave network has taken place in numerous other forums,
this paper aims to provide a precise and succinct description
of the system’s technical functions and the techniques used
to accomplish them.

Arweave is a permanent information storage system. The
word “permanent” has a number of definitions: the Oxford
Dictionary of English defines it as “lasting or intended to

last or remain unchanged indefinitely”, while the Merriam-
Webster Dictionary defines the word as “continuing or en-
during without fundamental or marked change”. In this pa-
per, we use a definition of permanence that encompasses
both of these ideas: the Arweave network is engineered to
store data for the maximum possible term, without change.

In order to achieve its goals, the Arweave protocol is con-
structed of three core components:

• Cryptographic Proofs of Storage: A succinct cryp-
tographic proof system for verifying even replication and
accessibility of data.

• Storage Endowment: A predictable, self-executing
endowment that pays for storage, utilizing the defla-
tionary effect of technology improvements over time.

• Incentivized Evolution: A mechanism for allowing
protocol adaptability by generating and rewarding non-
coercive upgrades to the network.

The Arweave protocol is precisely codified, immutable,
and entirely transparent. As a result, the network is able to
offer its users predictable rights and guarantees that would
not be possible in a comparable system operated by central-
ized institutions. This paper precisely documents each com-
ponent of the fully implemented protocol as of the current
version at the time of writing (v2.7.0). Users are encour-
aged to read this paper and audit the protocol themselves
in order to gain a first-hand understanding of the services
and guarantees that the protocol offers. In order to facili-
tate verification of the protocol, we provide references to the
codebase at various points in this paper.

2 Protocol Design

A core component of Arweave’s construction is a decentral-
ized consensus system for appending to and verifying data
in the network. Decentralized consensus is a subfield of
distributed computing that encompasses significant research
on protocols that enable participants in a network to reach
agreement on a state, even in the presence of adversarial
nodes [12, 13, 15]. The field gained prominence with Bit-
coin’s innovative Nakamoto consensus which allowed, for the
first time, agreement in an adversarial and permissionless
environment [24]. As a consequence of this innovation, Bit-
coin was able to create the first digital currency that did

1


not depend on centralized human actors for administering
monetary policy. Arweave draws inspiration from Bitcoin’s
proof-of-work structure for achieving consensus and adapts
its implementation to incentivize the permanent storage of
information within its network.

Arweave is a global network of computers, referred to as
“nodes”, which collectively store multiple copies of all data
uploaded to the system. Users who wish to store information
on Arweave provide a one-time upfront contribution to the
network’s storage endowment and upload the corresponding
data by transmitting it to nodes within the network. Nodes
periodically reach consensus on new data to be added to
the network’s globally distributed repository each time one
of the nodes successfully mines (confirms) a block. A block
is a list of transactions, each containing either new data to
be added to the network, or transfers of its currency AR,
or both. Mining is a process run by every participating
node that serves the dual purpose of accepting new data into
the network and validating storage of previously uploaded
data. Nodes “pull” data that they would like to replicate
for mining from other peers in the network after a block
containing the transaction has been confirmed.

2.1 Design principles

The two primary principles that underpin the design of the
protocol are as follows:

• Minimalism: The protocol aims to be straightforward
and minimally opinionated in its design, in order to fos-
ter the widest possible social consensus. Arweave only
uses well-tested cryptographic primitives in composing
its data structures and algorithms.

• Optimization through incentives: Rather than ex-
plicitly prescribing desired behaviour, the protocol in-
centivizes participants to achieve desirable outcomes.
The specific mechanisms for achieving these outcomes
arise organically and will evolve over time.

By nature of its conservative design, the Arweave proto-
col is focused on concisely solving one problem – permanent
and scalable data storage. This leaves the application layer
on top of the protocol extensible and composable, enabling
wide and varied uses of the network [35]. This has led to the
emergence of many different decentralized smart contracting
platforms, databases and applications, built on top of Ar-
weave [31, 19, 25, 38]. Further, Arweave’s highly efficient
proof system imposes very low hardware and bandwidth re-
quirements – comparable with Bitcoin, regardless of dataset
size – in order to maximize participation and decentraliza-
tion in the network.

In order to illustrate the effectiveness of incentives in pro-
tocol design, we can examine its effects in the Bitcoin net-
work. Bitcoin rewards miners for discovering a nonce that,
taken together with a candidate block, produces a hash that
is lower than a certain value. The uniformly random distri-
bution of hashes obtained from a {nonce, block} pair incen-
tivizes miners to find ways to compute the largest number
of hashes at the lowest possible cost. This has led to the de-
velopment of application-specific mining hardware that has

Figure 1: Incentives for efficient mining in Bitcoin have
contributed to significant improvements in speed and costs

of computing hashes (Y-Axis is log scale) [9].

increased the number of hashes that can be computed per
second by a factor of 1013 since 2011. Notably, this increase
has outperformed Moore’s law [28] by a factor of nearly 1010

in the same period, up to the time of writing. The same
incentives have also led to a decrease in the cost per hash
in the Bitcoin network by a factor of 1,000,000 in the same
period [9]. In Arweave, we incentivize participants to solve
a different set of problems, modifying Bitcoin’s mechanism
to incentivize optimization of useful proofs of storage and
routing of data.

3 Cryptographic Proofs of Storage

The mining mechanism of Arweave composes many distinct
algorithms and data structures into a non-interactive and
succinct proof of replicated storage of data in the network.
In this section, we will initially examine these primitives in
the abstract and then describe their use in the process of
Arweave mining.

3.1 Block Index

The top-level data structure in Arweave is called the Block
Index [2]. It is a merkelized list of 3-tuples, each containing
a block hash, a weave size, and a transaction root. This list
is represented as a hashed merkle tree [23] with the top-
level merkle root representing the most recent state of the
network. This root is composed of the hash of two elements –
the most recent tuple (representing the latest block) and the
previous block index’s merkle root. While creating a block,
miners embed the previous merkle root of the block index
within the new block.

By constructing and embedding a merkle root of the block
index in each block, a node in the Arweave network that has
performed an SPV proof [24] of the latest blocks is capable
of fully validating any prior block. This stands in constrast
to traditional blockchains in which individual validation of
transactions in older blocks requires full verification of the
chain back to that block. After performing an SPV proof on
the tip of the network, the Block Index verification mech-
anism allows the user to validate older blocks in the weave

2


Figure 2: The block index represents a merkle tree of
blocks in the Arweave network.

without needing to download or validate intermediate block
headers.

3.2 Merkelization of data

When information is to be added to Arweave, the user splits
the data into regularized 256 KiB chunks. After the chunks
are prepared, a merkle tree is constructed and its root (called
the data root) is committed into a transaction. This transac-
tion is signed and dispatched to the network by the uploader.
Each transaction so constructed is represented within the
transaction root [3] of the block it was accepted in. A block’s
transaction root is therefore a merkle root of data roots of
transactions.

Figure 3: The transaction root is a merkle root of data
roots present in each transaction inside a block.

Miners in the Arweave network combine such transactions
into a block, compute the transaction root using the data
roots inside the transactions, and include this transaction
root in the new block. This transaction root eventually ends
up in the top-level block index as described above. The
output of this entire process is the creation of an expanding
merkle tree with all of the data in the protocol sequentially
ordered into chunks.

3.3 Succinct Proofs of Access

In the Arweave network, nodes in a merkle tree are labelled
with the offset of the data that can be found to the “left” or
“right” of the node [8]. This labeling of nodes enables the
creation of succinct merkle proofs that allow verification of

Figure 4: The data root is a merkle root of all hashed
chunks in a single transaction.

the existence of chunks at specific locations in the dataset.
Such a merkle tree is called “unbalanced” as the number
of leaves on either side of a node may not be equal. This
structure can be used to construct a succinct non-interactive
proving game about the availability of data in a miner’s hard
drives.

The Succint Proof of Access (SPoA) game begins when
Bob wants to verify that Alice is storing data at a specific
location of a merkelized dataset. In order to play the game,
Bob and Alice should have the same merkle root for the
dataset. The game works in three phases: challenge, proof
construction, and verification.

Challenge: To start the game, Bob sends Alice a chal-
lenge offset – the index of the byte that Alice should prove
access for.

Proof Construction: Upon receiving this challenge, Al-
ice begins to construct a proof. She searches her merkle tree
to find the chunk corresponding to the offset.

Figure 5: Alice can search for the requested chunk by
traversing the merkle tree labelled with offsets.

In the first stage of proof construction, Alice retrieves the
entire chunk from storage. She then constructs a merkle
proof using the chunk and the merkle tree. She hashes the
entire chunk to obtain a 32-byte identifier, and locates the
parent of this leaf node in the tree. The parent is a node
containing its offset and the two hashes of its child nodes,
each of which is 32 bytes in length. This parent is added

3


Figure 6: The proof for a challenge offset includes the
chunk stored at the offset and its merkle path.

to the proof. Each parent is then recursively added to the
proof until Alice reaches the merkle root. This series of nodes
forms the merkle proof that, along with the chunk, Alice
sends to Bob [8].
Verification: Bob receives the merkle proof and the

chunk from Alice. He verifies the nodes in the merkle proof
by recursively validating the hashes in the tree from the
root to the two leaf nodes. He then hashes the chunk in-
cluded in Alice’s proof and verifies that this chunk matches
the leaf with the correct offset. If the hashes match, the
proof is successful. If at any stage, however, the hashes that
Bob is verifying are incongruent, Bob stops the computation
and rejects Alice’s proof without needing to compute any re-
maining hashes. The output of this verification function is a
boolean value, indicating the success or failure of the proof.

3.3.1 SPoA Complexity

The complexity of constructing, transmitting and validating
SPoA proofs is O(⌈logn⌉), where n represents the size, in
bytes, of the dataset. In practice, this means that the proof
of the location and correctness of a single byte in a dataset
of size 2256 bytes can be transmitted in just 256∗96B (max-
imum path size) + 256KiB (maximum chunk size) ≈ 280
KiB.

3.4 Replica Uniqueness

Mining power in the Arweave network is engineered to be
proportional to the number of replicas of the weave accessi-
ble by a miner (or a group of cooperating miners). In the
absence of a replica uniqueness scheme, copies of a piece of
data are identical to each other in their content. In order to
incentivize and measure multiple unique replicas of a single
piece of data, Arweave deploys a system of packing.

3.4.1 Packing

Packing ensures that miners cannot forge SPoA proofs claim-
ing to represent multiple unique replicas from the same

stored chunk. Arweave’s packing scheme uses RandomX [34]
to convert regular data chunks into “packed” chunks in a
process that incurs a compute cost for the miner.

Packing in Arweave works as follows [4, 5, 6]:

• The chunk offset, transaction root and mining address
are combined in a SHA-256 hash to generate a packing
key.

• The packing key is used as entropy for an algorithm
that generates a 2 MiB scratchpad over several succes-
sive rounds of RandomX execution. This component of
the packing mechanism incurs significant cost for non-
compliant miners.

• The first 256 KiB of the resulting entropy is used to
symmetrically encrypt the chunk using a Feistel cypher
[22]. This computation results in a packed chunk.

If the overall cost of packing chunks is greater than the
cost of storing packed chunks in the time period between
consecutive reads, the miner is incentivized to store unique
chunks as opposed to packing chunks “on-demand” in or-
der to maximize mining efficiency. In order for the correct
behaviour to be incentivized, the following invariant must
hold:

cs(chunk)∗ t < cp(chunk)

where:

cs = Cost of storing a chunk for unit time
cp = Cost of packing a chunk
t = Mean time between chunk reads

Total packing costs incorporate the price of electricity for
computation, price of specialized hardware and its mean life-
time. The cost of storage can similarly be computed from
the cost of hard disks, mean-time to hard disk failure, elec-
tricity costs of operating the disk as well as other costs as-
sociated with operating storage hardware. We can compute
the ratio of the costs of a high-efficiency on-demand miner
versus an honest miner that stores packed chunks to esti-
mate the safety margin for incentivizing packed chunks. In
practice, Arweave uses a safety ratio – the ratio of the cost
of creation of chunks on-demand to the cost of storage of the
chunk for mean time between access – of approximately 19 at
the time of writing, benchmarked on commercially available
hardware.

3.4.2 RandomX

In order to ensure packing remains resistant to hardware
acceleration, Arweave runs a carefully chosen hashing al-
gorithm called RandomX over the packing key. RandomX
is optimized for general-purpose CPUs as it uses random-
ized code execution during the hashing process, and several
memory-hard techniques that minimize the efficiency of spe-
cialized hardware [34]. RandomX’s use of randomly gen-
erated programs that closely match the hardware of mod-
ern CPUs implies that the only way to accelerate RandomX
hashing is to create faster general-purpose CPUs. While the

4


Arweave protocol theoretically could incentivize the creation
of faster CPUs, we note that the effect of this incentive rel-
ative to the existing incentives for improving CPU speeds
would be negligible.

3.5 Verifiable Delay Function

A verifiable delay function (VDF) allows computational ver-
ification of the passage of time between events. A VDF re-
quires a specified number of sequential steps to evaluate, yet
produces a unique output that can be efficiently and publicly
verified [10, 41]. There are several techniques for construct-
ing VDFs. Arweave uses a chained hashing technique, where
the VDF is defined recursively as follows:

V (n, seed) = hash(V (n−1, seed))

for n > 1, and:

V (1, seed) = hash(seed)

The hash function used in Arweave’s VDF is SHA2-256 [7].
Using this construction, if the fastest available processor can
produce only k sequential SHA2-256 hashes every second,
then the correct computation of V (k, seed) using just the
seed requires at least 1 second. This is because the hash of
a hash is impossible to compute without computing the first
hash. A correctly generated V (k, seed), called a checkpoint,
implies the passage of at least 1 second between the prover
receiving the seed and generating this checkpoint.

VDF construction takes O(n) time for n steps. However,
if intermediate checkpoints are transferred to the verifier,
the verification of correctly generated VDF outputs can be
completed in O(n

p
) time using p parallel threads. We use

the chained-hash VDF rather than other methods of VDF
construction due to the simplicity of its assumptions and its
singular reliance on the robustness of the hash function.

3.6 Succinct Proofs of Replications

The SPoA game described earlier can be used by any partici-
pant to prove that they are storing some data at a particular
location in the dataset. This system can be used to cre-
ate a second game, called Succinct Proofs of Replications
(SPoRes), that will allow a prover to convince a verifier that
the prover stores a certain number of data replicas with ex-
tremely minimal data transfer and verification overhead, re-
gardless of dataset size.

3.6.1 SPoRes Game

The SPoRes game works as follows. Alice claims that she
stores n copies of a merkelized dataset. Bob wants to verify
Alice’s claim. In order to do this, he provides Alice with a
difficulty parameter d, and a random seed. Alice uses the
seed to generate a VDF chain that emits a checkpoint every
second that can be used to unlock a maximum of k SPoA
challenges within the dataset. Whenever she has packed
chunks corresponding to any of these challenges, Alice can
construct corresponding SPoA proofs. Each of these proofs
is then hashed and compared to Bob’s difficulty parameter

d. If the proof hash is greater than d, then Alice has found
a valid solution and sends the corresponding proof to Bob.
Bob records the time it takes between delivering the random
seed and receiving a valid proof from Alice.

Based on the difficulty d, the probability of finding a valid
solution on a single trial is given by:

p =
2256 −d
2256

If Alice has N replicas and can perform k trials per replica
every second, her probability of finding a solution in any
given one-second time period is:

p2 = 1− (1−p)kN

The time taken for Alice to deliver the proof can be mod-
eled with a geometric random variable X ∼ geom(p2), with
probability p2 of success. This random variable depends on
d, k and N. In order to verify Alice’s claim, Bob sends over
a difficulty d such that she can deliver proofs to him once
every 120 seconds, given that she has N replicas. This is
equivalent to sending a difficulty parameter such that:

p2 =
1

120

or equivalently:

p2 = 1− (
d

2256
)kN

If Alice can deliver the proof in the required time frame,
she is likely to have the right number of replicas. A single
proof, however, will not be sufficient for Bob to be convinced
that Alice isn’t lying – after all, she might get lucky and find
a proof quickly even with fewer stored replicas. But if Alice
can consistently deliver proofs on average every 120 seconds
over a very long time period, Bob can be reasonably assured
that Alice is storing the desired amount of data.

We shall attempt to quantify the certainty that Bob has
about Alice’s storage. Let us say that Alice claims to store
20 replicas, and has consistently delivered proofs at an aver-
age of 120 seconds over a 2-week time period (for a total of
10,080 proofs). Bob is interested in knowing the probability
that Alice managed these proofs despite storing only 19 (or
fewer) replicas. This represents a different one-second proof
probability:

p∗2 = 1− (1−p)
19k

Which we can simplify to:

p∗2 = 1− (
d

2256
)19k (1)

This probability can be calculated using the value of d
that Bob generates for a truth-telling Alice. Her one-second
proof probability if she stores 19 or fewer replicas is given
by the series of inferences given below. Here, p2 represents
the probability of producing a proof in one-second when Al-
ice is truthfully storing 20 replicas, and p∗2 represents the
one-second proof probability if she is lying (storing ≤ 19
replicas):

p2 = 1− (
d

2256
)20k =

1

120

5


⇒ (
d

2256
)20k =

119

120

⇒ (
d

2256
)19k = ((

d

2256
)20k)

19
20 = (

119

120
)

19
20 = 0.99208167919

⇒ p∗2 = 1− (
d

2256
)19k = 0.00791832081

Therefore p∗2 = 0.00791832081, corresponding to an ex-
pected proof production time of 126.2894 seconds. Let X∗ be
the random variable obtained from p∗2, i.e. X

∗ ∼ geom(p∗2).
This represents Alice’s distribution if she is lying – storing
only 19 replicas. The expected value (mean) of X∗ is:

E[X∗] =
1

p∗2
= 126.289402008

We can use the Central Limit Theorem [17] to estimate the
probability that the sample mean is below 120, i.e., different
from the mean E[X∗] obtained above. This is represented
by the probability:

P(
∑

X∗i
10080

≤ 120) = P(
∑

X∗i
10080

≤ 126.2894−6.2894) (2)

For a large number of samples, the left hand side in this
inequality tends to a normal distribution with mean E[X∗]
and variance

σ2X∗√
n
, where σX∗ is the variance of Xd and n

(=10,080 here) is the sample size [17]. Therefore (2) yields:

P(N(E[X∗],
σX∗√
10080

) ≤ E[X∗]−6.2894)

In the equation above, N represents a normal distribution.
We can convert this distribution to its standard normal form
to obtain the equivalent probability:

P(N(0, 1) ≤−6.2894∗
√
10080

σX∗
)

Finally, using standard normal distribution tables, we can
obtain the probability that Alice has lied over the 2-week
time period:

P(N(0, 1) ≤−5.019943) = 2.584∗10−7

Therefore, in this example, Bob can ascertain with
≈99.99997416% certainty that during the 2-week period Al-
ice has stored over 19 replicas of the dataset. We note that
this certainty would be even higher if Alice was storing fewer
than 19 replicas, owing to an expected proof-production time
longer than 126.3 seconds. We also note that this entire
proof system requires the transfer of just one proof every
120 seconds. This is an average data transfer rate between
Alice and Bob of 2.389 KiB per second (280 KiB/120 sec-
onds), comparable with the synchronization overhead of the
Bitcoin [24] network.

Finally, we note that these probabilities do not change
with arbitrary increases in the size of the dataset, while in-
creasing the sampling period continues to improve the accu-
racy of Bob’s certainty of Alice’s replica count at a superlin-
ear rate (Fig 7).

Figure 7: Certainty increases superlinearly to sample
duration in the SPoRes game. After two weeks of sampling,
the probability that less than 19 replicas are maintained is

negligible (P < 0.0000002584).

Formally, a SPoRes game is defined by its parameters:

SPoRes(r, k, d)

where:

r = The merkle root of the dataset to be used for the
game.

k = The maximum number of challenges unlocked per
second per replica.

d = The difficulty parameter that determines the proba-
bility of success on each try.

4 Protocol Construction

In Arweave mining, a modification of the SPoRes game out-
lined in the previous section is deployed. During mining, the
protocol acts as Bob and all miners in the network together
play the role of Alice. Each proof of a SPoRes game is used
to create the next block in Arweave. Specifically, Arweave
block construction uses the following parameters:

SPoRes(BI, 800∗np, d)

where:

BI = The block index of the Arweave network.
np = The number of 3.6 TiB partitions of the weave stored

by miners.
d = The difficulty of the network.

These parameters allow a maximum of 800 hashes per par-
tition per checkpoint. A successful proof is one that is greater
than the difficulty, which is adjusted over time to ensure that
blocks are mined every 120 seconds on average. If the time
difference between blocks i and (i + 10) is t, the difficulty
adjustment from the old difficulty di to the new difficulty
di+10 is computed as follows:

di+10 = 2
256 − (2256 −di)r

6


where:

r =
t

120∗10
The newly computed difficulty implies a probability of

block-mining success on each generated SPoA proof given
by:

pi+10 =
(2256 −di)r

2256
= pir

The VDF difficulty is similarly recomputed to maintain
single-second checkpoint periods over time.

4.1 Incentives for Replica Completeness

Arweave’s block construction using SPoRes includes an as-
sumption of rational agent behavior under incentives for
maintaining full replicas, either by a lone miner or by co-
operation between miners. This section explores how these
incentives are provided, and how their effectiveness can be
validated in the Arweave network. In the abstract SPoRes
game as previously presented, the net effect of storing two
copies of the same half of the dataset would yield the same
number of SPoA hashes as one full replica spanning the en-
tire dataset. The deployed version of the game in Arweave
includes a modification that incentivizes miners to store and
maintain all parts of the dataset – either cooperatively or
independently.
The protocol provides incentives for accessing full repli-

cas by splitting the number of SPoA challenges unlocked per
second into two halves – one half requires just one partition
of the dataset, and the other spans all partitions. To un-
derstand this, we will examine how the VDF construction
described in the previous section is used to unlock SPoA
challenges [27].

Algorithm 1 Unlocking SPoA Challenges

inputs: seed, M
k ← 0
while true do

check ← V (k + M, seed)
for p ∈ partitions do

H0 ← RandomX(check, addr, index(p), seed)
C1 ← H0 mod size(p)
C2 ← H0 mod (

∑
q∈partitions

size(q))

i = 0
while i < 400 do

chunk1 ← chunkAtOffset(C1)
H1 ← spoa first(chunk1, C1)
chunk2 ← chunkAtOffset(C2)
H2 ← spoa second(chunk2, C2, H1)
C1 ← C1 + 262144
C2 ← C2 + 262144
i ← i + 1

end while
end for

end while

Approximately once per second, the VDF chain outputs
a checkpoint [7]. For every partition of the weave that the

miner stores, this checkpoint is mixed with the mining ad-
dress, the partition index, and the original VDF seed to ob-
tain an intermediate RandomX hash. This 256-bit hash is
treated as a number and divided by the size of the partition,
yielding a remainder that is used as a recall offset. This offset
unlocks 400 256-KiB challenges over a contiguous 100 MiB
recall range starting from this offset. The mechanism that
incentivizes full replicas is that each such recall range within
a partition also unlocks another, global, set of 400 challenges
in a second recall range with the constraint that solutions
in the second range require a solution at the corresponding
location in the first range.

4.1.1 Performance Per Packed Partition

The performance per packed partition represents the number
of SPoA challenges that each partition yields for every VDF
checkpoint. This number is greater when the miner stores
unique partitions than when the miner stores multiple copies
of the same data.

If the miner owns only unique data, each packed parti-
tion will yield all first-range challenges along with the few
second-range challenges that fall within this partition. With
n unique partitions stored out of a total m in the weave, this
yields a performance per packed partition of:

perfunique(n, m) =
1

2
(1 +

n

m
)

When a miner stores partitions that are copies of the same
data, each packed partition still yields all first-range chal-
lenges. Only on 1

m
occasions, however, the second recall-

range will be inside the same partition. This leads to a highly
significant performance penalty, yielding a rate of just:

perfcopied(n, m) =
1

2
(1 +

1

m
)

Figure 8: The performance for a given partition increases
as a miner (or group of collaborating miners) completes

their dataset.

The blue line in Figure 8 displaying {perfunique(n, m)}
shows the efficiency per partition when storing n partitions

7


out of a maximum m. It demonstrates that the mining effi-
ciency per partition is only 50% when the miner stores very
few partitions from a replica. The mining efficiency is maxi-
mized when storing and maintaining all parts of the dataset,
i.e., when n = m.

4.1.2 Total Hashrate

The total hashrate (shown in Figure 9) is given by the fol-
lowing equations, obtained by multiplying the per partition
values by n:

tperfunique =
1

2
n(1 +

n

m
)

tperfcopied =
1

2
n(1 +

1

m
)

The equations above show that as the size of the weave
grows, the penalty function (for not storing unique data)
increases quadratically with the number of stored partitions.

Figure 9: The total mining hashrate for unique and copied
datasets.

4.1.3 Marginal Partition Efficiency

Given this structure, let us consider the decision problem
a miner faces if they were to add a new partition and are
deciding between replicating a partition that they already
own, or fetching new data from other miners. When they
already store n unique partitions out of a maximum m, their
mining hash rate is proportional to:

tperfunique ∝ (n +
n2

m
)

The additional benefit of adding one new partition is:

(n + 1) +
(n + 1)2

m
−n−

n2

m
= 1 +

2n + 1

m

The (smaller) benefit of replicating an already packed par-
tition is:

(n + 1) +
n2

m
+

1

m
−n−

n2

m
= 1 +

1

m

Dividing the first quantity by the second, we get the
miner’s marginal efficiency from acquiring a new partition
relative to the additional efficiency of duplicating an exist-
ing partition. We call this the relative marginal partition
efficiency:

rmpe(n, m) =
2n + m + 1

m + 1
= 1 +

2n

m + 1

Figure 10: Miners are incentivized to organize into
complete replicas (option 1), rather than making additional

replicas of data that they already have (option 2.

The rmpe value can be seen as a penalty for copying ex-
isting partitions when adding new capacity as a miner. Con-
sider the efficiency trade-off at different values of n when m
is large:

• The reward for completing a replica is highest when a
miner has close to a full copy of the data set. Indeed, if
n → m and m → ∞, we get an rmpe value of 3. This
means that close to a full replica, the efficiency of seek-
ing out new data is 3 times the efficiency of repacking
existing data.

• When the miner stores half the weave, i.e. when n =
1
2
m, rmpe is 2. This means that the benefit to the miner

of seeking out new data is twice the benefit of copying
existing data.

• For lower values of n, the rmpe value tends towards but
is always greater than 1. This configuration infers the
greatest performance penalty upon the miner.

As the network grows (m → ∞), there will be a strong
incentive for miners to organize into full replicas. This pro-
motes the creation of cooperative mining groups that to-
gether store at least one full replica of the dataset.

4.2 Network Metrics

Using the equations described in the previous section and
some information from the network, we can determine the
number of replicas that the network is storing. With each

8


block that miners create, we can identify whether the so-
lution hash is generated from a first or second-range SPoA
proof. In a network that only stores full replicas, we would
expect this ratio to be 1:1. However, when miners mine with
less than a full partition or duplicated partitions (and there-
fore gain an efficiency penalty), the ratio would be lower
than 1.

We can calculate the average hashes per partition by cal-
culating the ratio of the observed SPoAs. Suppose that, over
the last 1,000 blocks there are n1 first-range SPoAs and n2
second-range SPoAs. This means that the average replica
completeness is n2

n1
and, as a result, the mining efficiency per

partition is:

em =
1

2
(1 +

n2
n1

)

Using the above expression, we can accurately estimate
the total number of partitions in the network. The expected
number of hashes tried when the difficulty parameter is d is
given by:

E[trials] =
1

p
=

2256

2256 −d

The expected number of partitions that are needed to gen-
erate these many trials in a 120-second time period when
each partition is only em efficient is:

E[partitions] =
E[trials]

800em ∗120
(3)

With a partition size of 3.6TiB, we can derive the deployed
storage capacity of the network:

Storage = E[partitions]∗3.6TiB

All of these metrics regarding the stored dataset and aver-
age replica completeness can be computed from the observed
values in the network without additional coordination over-
head.

4.3 Incentives to Optimize Data Routing

Encouraging miners to organize into full replicas for effi-
cient mining yields a number of useful downstream incen-
tives. One such notable incentive is the impetus for miners
to create optimized solutions for fast data routing within a
peer-to-peer network, thereby addressing an otherwise com-
plex and critical distributed systems challenge [21, 18, 29].
This necessity arises as nodes are required to be able to
swiftly transmit any data chunk within the network to the
location of any other chunk as part of mining, subsequently
demanding the maintenance of routing capabilities that can
be reused to facilitate data access for users and other miners.

The introduction of this new incentive for miners to opti-
mize data routing is likely to stimulate a competitive land-
scape reminiscent of the race for optimized hashing hardware
among Bitcoin miners. Such competition will drive innova-
tion in routing infrastructure, ultimately fostering a more
efficient and robust distributed network.

4.4 Bandwidth Sharing Incentives

Another downstream effect of Arweave’s mining incentives
for storage replication is the inherent imperative for miners
to gain access to data in the network. This creates a wide
variety of market structures for data access, including:

• Karma and Optimistic Tit-for-Tat: Some nodes in
the Arweave network engage in a BitTorrent-like game
[11] for bandwidth sharing. In this game, nodes share
data reciprocally with other miners that share data with
them. Additionally, nodes occasionally share data at
random, optimistically expecting future reciprocation.
Each node maintains their own rankings of peers, with
no obligation to report how or why these rankings have
been determined. Such a mechanism has been very suc-
cessful and highly adaptive in BitTorrent, a data sharing
platform that was responsible at one point for approxi-
mately 27% of the world’s internet traffic [20].

• Payments for physical distribution of disks: Node
operators may directly buy or sell disks loaded with
data from the weave, in exchange for monetary or other
forms of payment. This may be a preferable option
for bandwidth-limited miners because of the amount of
data required in order to operate Arweave nodes. This
mode of transmission bypasses traditional packet filters
and firewalls.

• Payment protocols: Nodes may also participate in
protocols and markets that allow them to pay for data
upon access. One such implementation is provided by
the Permaweb Payment Protocol (P3) [40], that uses
payment channels to incentivize a variety of services (in-
cluding simple data access) within Arweave.

4.5 Scalability

Arweave blocks are created on average every 2 minutes, each
containing a maximum of 1,000 transactions. This limit
ensures that block validation and synchronization stays ex-
tremely lightweight, allowing the network to remain widely
decentralized. However, this transaction limit does not im-
pose any restriction on the size or quantity of data items
that can be stored in a given block due to the network’s
“bundling” mechanism [1]. Bundling is a network-wide stan-
dard built on top of the core protocol for consolidating many
different data entries into a single transaction. These data
entries are functionally equivalent to top-level data stor-
age transactions on the network as, upon retrieval, bun-
dled transactions can be “unbundled” into their constituent
items.

Arweave’s maximum transaction size is 2256 − 1 bytes,
which can be subdivided into arbitrary numbers of individ-
ual data entries inside potentially recursive bundles. This
allows the throughput of the network to be scaled without
practically applicable limits. This optimization is possible
because data uploads on Arweave are not parameterized –
every byte on the network is part of the same global merke-
lized dataset, and is funded by a shared endowment. One

9


Figure 11: Bundling allows recursively stacking data
uploads into a single top-level transaction.

of the components of this design is the aggregation of pay-
ments from individual data entries to the uploaded bundle.
Users either aggregate payments for their data items within
a single bundled transaction, or move their payments en-
tirely off-chain to a bundling service which groups their data
entries with those of other users.

In Arweave, transactions are selected for inclusion inside
the 1,000 slots in each block according to their total value, as
miners earn an inclusion fee proportional to the transaction
fee. In the presence of block space scarcity, this incentivizes
bundling services to recursively group transactions, adding
to the scalability of the network. As a consequence, any num-
ber of bundlers and users can write to the network at any
given time without leading to the typical block space auc-
tion mechanics of other blockchains. Further, competition
between bundlers to build larger transactions creates down-
ward pressure on the fees that they will impose on end users.
This is in contrast to other blockchain networks, where com-
petition for limited block space increases the fees imposed
upon users until some users are priced out of utilizing the
network.

Figure 12: Preference for larger bundles creates incentives
to recursively bundle data, minimizing fees.

One additional consequence of moving transaction negoti-
ation for data uploads to off-chain bundling services is that
users can pay for Arweave storage in any payment mode sup-
ported by the providers, with grouped data entries settled in
AR by the bundlers. At the time of writing, the Arweave net-

work supports at least 18 different payment modes through
bundling services [14, 37].

5 Storage Endowment

Arweave’s incentive model requires uploaders of data to pay
a small transaction placement fee and provide an upfront
contribution, denominated in AR, to the network’s storage
endowment. This endowment serves as a faucet through
which miners are paid out over time, as they collectively
provide proofs of replication of the dataset. The necessary
payout from the endowment to maintain a piece of data de-
creases as the cost of storage declines.

5.1 Storage Pricing

Users pay upfront for the storage of 20 replicas for 200 years,
at present costs. The trustless mechanism that the protocol
employs to determine the price of storage acquisition from
miners is described below. In this section, we sequentially
construct the complete calculation of storage price that is
used by the protocol.

During the period of a single block B with difficulty dB,
the estimated number of partitions in the network is given
by equation (2) from §4.2:

E[partitions] =
2256

800∗120∗ (2256 −dB)

This expression multiplied by the partition size calculates
the total amount of storage currently in the network at the
time of block B:

E[storage] =
2256 ∗3.6TiB

800∗120∗ (2256 −dB)

The amount of AR given out as rewards to miners and the
difficulty for the block could be used to estimate the storage
acquisition cost – the price of servicing 1 GiB for 1 minute
at the time of block B:

P ∗m(B) =
rB

2∗1024∗EdB [storage]

where:

P ∗m(B) = The estimated cost of storing 1 GiB for one
minute at the time of block B.

rB = Total reward for block B.

Using a single block period this estimation of storage price
has high variance, owing to differences in the collected trans-
action placement fees and the difficulty adjustment algo-
rithm. Therefore in practice, the network records the dif-
ficulty and released rewards over a large number of blocks.
These recordings are used by the network to accurately cal-
culate the storage acquisition cost from miners over a 6-week
period preceding a block:

Pm(B) =

∑hB
i=hB−n P

∗
m(Bi)

n

where:

10


Pm(B) = The average storage acquisition cost for 1 GiB-
minute, taken over a 6-week period.

hB = Height at block B.
n = Number of blocks in a 6-week period (30∗24∗7∗6 =

30, 240).

Using these calculations, the network can accurately esti-
mate the acquisition cost of storage for one GiB for a 1-block
period (∼2 minutes):

Pb(B) = 2∗Pm(B)

Given this formula, the protocol calculates the present
price of 20 replicas of any piece of data D for 200 years
as follows:

P(D) = 20∗200∗365∗30∗24∗size(D)∗Pb(B)

This is the price charged to the user as an upfront con-
tribution to the storage endowment. Miners are paid out
from the endowment over time as they prove storage of the
network’s dataset, according to the following calculation:

re(B) = 20∗Pb(B)∗size(W)− (ri(B) + rf (B))

where:

re(B) = The withdrawal from the endowment in block B.
ri(B) = The inflation reward released in block B.
rf (B) = The transaction placement fee for transactions

accepted in block B.
PB(B) = The estimated cost of storing 1 GiB for one block

at the time of block B.
W = The total dataset stored on Arweave at the time of

block B.

5.2 Competition for Storage Efficiency

As a result of the storage pricing mechanism described above,
Arweave miners are incentivized to compete to increase the
efficiency of their mining operations. This has similar effects
to the competition observed in the Bitcoin network, where
miners have competed to minimize their cost of computing
hashes by building optimized software and hardware. As
a consequence of Arweave’s storage pricing mechanism, an
efficient market is created for storage acquisition without the
need for parameterized “on-chain order books” that would
hinder protocol scalability.

5.3 Deflation and Endowment Value

The Arweave network’s endowment removes tokens from cir-
culation every time data is uploaded, creating a reserve to
pay for data storage over time. The storage purchasing
power of the endowment is elastic, changing with the vol-
ume of data committed, the cost of data storage, and token
value over time. One of the main drivers of change in the
value of the endowment is that a decreasing cost of stor-
age creates a corresponding proportional increase in storage
purchasing power, leading to fewer tokens needing to be re-
leased from the endowment in the future. We call the rate of

decline in overall costs for storing a unit of data for a fixed
period of time the Kryder+ rate. This rate incorporates the
change in price of hardware, electricity, and operational costs
surrounding data storage.

Figure 13: Endowment health is affected by the difference
between the protocolized Kryder+ rate (0.5%) and actual

Kryder+ rates, as well as token price changes.

Users pay for 200 years worth of replicated storage at
present prices, such that only a 0.5% Kryder+ rate is suffi-
cient to sustain the endowment for an indefinite term, in the
absence of token price changes. Under these conditions, the
storage purchasing power of the endowment at the end of
each year would be equal to that at the beginning. The ac-
tual declining cost of storage over the last 50 years has been,
on average, ≈38.5% per year [16]. We note that this pattern
is a continuation of the diminishing real cost of information
encoding and preservation that can be observed throughout
history. Further, this trend appears likely to continue given
the strong incentives and significant scope for improvement.
The delta between the observed and protocol-specified Kry-
der+ rates is tuned to provide a wide safety margin for token
price volatility, with excess leading to deflation in the token
supply over time.

6 Decentralized Content Policies

The Arweave network employs a disintermediated, layered
system of content policies without centralized points of con-
trol or moderation. The underlying principle of this system
is voluntarism: every participant is able to choose what data
they would like to store and serve, without any mandates
from the protocol. This system allows each participant in
the network to create and operate their own content policy
without requiring consensus from others. The consequence
of this is a diverse selection of content policies, applied at var-
ious stages of the infrastructure supporting Arweave-based
applications:

• Miners: Miners in the network can run arbitrary com-
pute (various forms of text, image, video, etc. analysis)

11


upon the data they store to filter content they deem
unlawful or inappropriate. As miners store and pub-
licly serve their content, they are subject to the local
laws and regulations of their state. This disincentivizes
them from storing illicit content that does not conform
to local regulation.

• Gateways: Users often access content on Arweave
through a gateway [30]. Gateways act as portals, allow-
ing users and developers access to data in the Arweave
network without running their own nodes. Just as min-
ers can choose their own content policy with regard to
the data that they store, gateways are also able to in-
dependently decide which content they index and serve.
Further, the interoperability of gateways in the network
allows users to choose gateways that conform to their
personal beliefs and values.

• Applications: The final layer of content moderation
that may affect Arweave users is at the application
level. Each application built on data from Arweave
may enforce an additional filter on content served by
their interfaces as their developers have programmed.
These application-level content policies can be embed-
ded in the source code of the applications themselves
and stored immutably on Arweave — giving users the
ability to trust how the app will moderate content, per-
manently [26].

7 Protocol Evolution

Over time, it is inevitable that the circumstances surround-
ing the Arweave network will change. Truly permanent stor-
age requires a system capable of adapting to such changes
as they arise. We note that existing methods of blockchain
protocol governance have fallen short of simultaneously al-
lowing protocol adaptability and preservation of user guaran-
tees [39]. In order to address these concerns, the governance
mechanism deployed in the Arweave network improves upon
the traditional blockchain forking model in order to provide
a framework for incentivized evolution of the protocol. The
governance procedures of the Arweave network are purely
constitutional in nature. They are a social protocol agreed
upon by community participants, allowing the technical im-
plementation to mutate as necessary. The full Framework
and the Principles that it references can be found on the
Arweave network itself [32, 33]. In the remainder of this
section, we describe the operation of this mechanism and its
emergent incentives.

7.1 Mechanism Overview

At its core, the framework is a system that enables and re-
wards permissionless innovation, via forks, on top of the Ar-
weave protocol. This mechanism gives rise to a form of evo-
lution: creating powerful incentives to generate, test and
select preferable mutations of the protocol.

By allowing anyone to offer a mutation of the protocol with
an associated reward, a market is created for improvements.
Broadly, the mechanism functions as follows:

1. Innovators choose a prior iteration of the protocol on
which to base their new work. They select the parent
for their innovation based on its perceived adoption and
fitness.

2. Innovators then create a mutation of the protocol with
new features and offer it to the community, minting a
new quantity of tokens that they believe represents rea-
sonable reward for their contribution.

3. Finally, community members assess the new mutation,
opting to use it above alternatives if it offers sufficient
improvements at a reasonable dilution.

The outcome of this process is a protocol that evolves to
adapt to its environment, maximizing ‘fitness’ – its utility
and robustness – while minimizing dilution. As can be in-
ferred, successful innovators will need to satisfy the two pri-
mary market participants: users of the protocol and other
innovators. If they achieve this, they will receive a market-
determined reward for their efforts.

7.2 Incentives for Cooperation

The permissionless nature of this mechanism invites all mar-
ket participants to cooperate rather than compete with prior
versions of the protocol. Specifically, anyone attempting to
build a permanent data storage system is incentivized to take
part in this mechanism for three primary reasons:

• Faithful participation in the mechanism allows them to
bootstrap adoption by easily inheriting an activated, en-
gaged userbase. This maximizes the reward for their
innovation, while minimizing expenses.

• By participating in the mechanism faithfully, innova-
tors can expect that their iteration will become part of
the lineage of permanent data storage systems. Con-
sequently, others will advance their protocol – carrying
their data and tokens forward – without further action
on their behalf.

• Any builder attempting to create a permanent infor-
mation storage system will need a flexible and adaptive
mechanism to respond to an environment that is certain
to change over time. Arweave’s evolutionary framework
provides a solution to this problem that all can partici-
pate in without cost or limitation.

Considering that these advantages are available to innova-
tors with very little true cost – after all, they may choose to
dilute as much as they consider appropriate – participation
in the mechanism represents a preferable proposition rather
than bootstrapping a competing service.

7.3 Incentives for Dataset Unification

Innovators are exposed to an additional powerful incentive:
to maximize the value of their new evolution by merging
the data of diverged lineages. This incentive emerges from
the dynamic that a unified protocol will generally be more
valuable than an evolution of a single strand of a diverged

12


lineage. The most basic strategy for achieving this unifica-
tion has two components: addressing the technical concerns
of the divergent communities, and including the data from
diverging forks into the new evolution. If executed correctly,
this strategy of unification leaves little reason for community
members to continue participating in the divergent fork, en-
couraging them to migrate their usage and token holdings to
the new evolution.

Figure 14: Data from divergent forks can be uploaded to
accepted evolutions of the protocol.

Approximately, an innovator is incentivized to unify data
from divergent forks in the evolutionary tree if the result of
the following expression is positive:

V = FMCD −FMCN −DC + DV

where:

V = Net value gain or loss as a result of including
the data from the divergent fork.

FM = The market capitalization of the divergent
fork.

CD = Proportion of market capitalization that will
move from an old fork to a new one if the old data
is included.

CN = Proportion of market capitalization that will
move from an old fork to a new one regardless of
whether the data is included.

DC = The cost – in terms of necessary dilution –
of copying the old data into the new evolution.

DV = The intrinsic social value of having data from
the divergent fork included in the new evolution.

Given that the DC term in this equation is typically small
compared to FM, innovators are often incentivized to include
data and unify the most prominent forks of the tree. Con-
sequently, users can have confidence that their data will be
included in the canonical version of the system as long as
they upload to a network with appropriate prominence and
usage.
When assessed in aggregate, the effect of these incentives

is that token holders will accrue a basket of assets across dif-
ferent evolutions, while Arweave users can expect their data

to be maintained by an ever-improving set of protocols as
the network evolves. For a deeper analysis of the compo-
nents of the framework, we refer readers to the companion
to the Framework for Evolving Arweave [36].

8 Conclusion

In this paper, we have presented the Arweave protocol: a
permanent information storage system that allows for disin-
termediated and peer-to-peer communication over time and
space. We have presented the technical and economic mech-
anisms of this protocol and explored their emergent dynam-
ics, as currently deployed in the network. As a component
of this, a novel evolutionary system has also been described,
which will allow the protocol to adapt to changing circum-
stances over time.

9 Acknowledgements

Arweave is the result of the combined efforts of many thou-
sands of contributors and teams that have built infrastruc-
ture, services and products at various stages in the develop-
ment of the network and its ecosystem. The Arweave pro-
tocol also enjoys a vibrant community of hundreds of thou-
sands that support the network as it grows. The authors
would like to thank them all for helping make this network
and the fulfilment of its mission possible.

References

[1] ANS-104: Bundled Data v2.0 - Binary Serialization.
url: https://github.com/ArweaveTeam/arweave-
standards/blob/master/ans/ANS-104.md.

[2] Lev Berman. https://github.com/ArweaveTeam/
arweave / blob / master / apps / arweave / src / ar _

block_index.erl.

[3] Lev Berman. https://github.com/ArweaveTeam/
arweave/blob/master/apps/arweave/src/ar_tx.

erl.

[4] Lev Berman and Sergii Glushkovskyi. url: https://
github.com/ArweaveTeam/arweave/blob/master/

apps/arweave/src/ar_packing_server.erl.

[5] Lev Berman and Sergii Glushkovskyi. url: https :
/ / github . com / ArweaveTeam / arweave / blob /

4f5d69a810317bf529a4e7b2ca133fbcbc8c532f /

apps/arweave/c_src/ar_mine_randomx.c.

[6] Lev Berman and Sergii Glushkovskyi. url: https://
github.com/ArweaveTeam/arweave/blob/master/

apps / arweave / c _ src / randomx _ long _ with _

entropy.cpp.

[7] Lev Berman and Sergii Glushkovskyi. url: https://
github.com/ArweaveTeam/arweave/blob/master/

apps/arweave/src/ar_vdf.erl.

[8] Lev Berman and Sam Williams. https://github.
com / ArweaveTeam / arweave / blob / master / apps /

arweave/src/ar_unbalanced_merkle.erl.

13

https://github.com/ArweaveTeam/arweave-standards/blob/master/ans/ANS-104.md
https://github.com/ArweaveTeam/arweave-standards/blob/master/ans/ANS-104.md
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_block_index.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_block_index.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_block_index.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_tx.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_tx.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_tx.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_packing_server.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_packing_server.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_packing_server.erl
https://github.com/ArweaveTeam/arweave/blob/4f5d69a810317bf529a4e7b2ca133fbcbc8c532f/apps/arweave/c_src/ar_mine_randomx.c
https://github.com/ArweaveTeam/arweave/blob/4f5d69a810317bf529a4e7b2ca133fbcbc8c532f/apps/arweave/c_src/ar_mine_randomx.c
https://github.com/ArweaveTeam/arweave/blob/4f5d69a810317bf529a4e7b2ca133fbcbc8c532f/apps/arweave/c_src/ar_mine_randomx.c
https://github.com/ArweaveTeam/arweave/blob/4f5d69a810317bf529a4e7b2ca133fbcbc8c532f/apps/arweave/c_src/ar_mine_randomx.c
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/c_src/randomx_long_with_entropy.cpp
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/c_src/randomx_long_with_entropy.cpp
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/c_src/randomx_long_with_entropy.cpp
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/c_src/randomx_long_with_entropy.cpp
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_vdf.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_vdf.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_vdf.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_unbalanced_merkle.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_unbalanced_merkle.erl
https://github.com/ArweaveTeam/arweave/blob/master/apps/arweave/src/ar_unbalanced_merkle.erl


[9] Blockchain.com. https : / / www . blockchain . com /
explorer / charts / hash - rate. [Accessed April 28,
2023].

[10] Dan Boneh et al. Verifiable Delay Functions. Cryp-
tology ePrint Archive, Paper 2018/601. https : / /
eprint.iacr.org/2018/601. 2018. url: https://
eprint.iacr.org/2018/601.

[11] Bram Cohen. Incentives Build Robustness in BitTor-
rent. 2003. url: http : / / www . bittorrent . org /
bittorrentecon.pdf.

[12] Danny Dolev and H. Raymond Strong. Authenticated
Algorithms for Byzantine Agreement. Nov. 1983.

[13] Danny Dolev et al. “An efficient algorithm for byzan-
tine agreement without authentication”. In: Infor-
mation and Control 52.3 (1982), pp. 257–274. issn:
0019-9958. doi: https : / / doi . org / 10 . 1016 /
S0019 - 9958(82 ) 90776 - 8. url: https : / / www .
sciencedirect . com / science / article / pii /

S0019995882907768.

[14] Everpay Docs. https://docs.everpay.io/en/docs/
guide/dive/deposit.

[15] Michael J. Fischer, Nancy A. Lynch, and Michael
S. Paterson. “Impossibility of Distributed Consensus
with One Faulty Process”. In: J. ACM 32.2 (Apr.
1985), pp. 374–382. issn: 0004-5411. doi: 10.1145/
3149.214121. url: https://doi.org/10.1145/
3149.214121.

[16] Hard Disk Prices Over Time. (Permalink).

[17] Robert V. Hogg, Joseph W. McKean, and Allen T.
Craig. Introduction to Mathematical Statistics. 8th ed.
Pearson, 2018.

[18] et. al Hun J. Kang. Why Kad Lookup Fails. url:
https://www-users.cse.umn.edu/~hoppernj/kad.

pdf.

[19] Brennan Lamey. KwilDB: The Decentralized
SQL Database. url: https : / / uploads - ssl .
webflow . com / 632df6381908134a8b796288 /

63373a98cfa42878438c5449 _ KwilDB % 20White %

20Paper.pdf.

[20] Fabrizio Marozzo, Domenico Talia, and Paolo Trun-
fio. “A Sleep-and-Wake technique for reducing energy
consumption in BitTorrent networks”. In: Concurrency
and Computation Practice and Experience 32 (Feb.
2020). doi: 10.1002/cpe.5723.

[21] Petar Maymounkov and David Mazières. “Kademlia:
A Peer-to-Peer Information System Based on the XOR
Metric”. In: Revised Papers from the First Interna-
tional Workshop on Peer-to-Peer Systems. IPTPS ’01.
Berlin, Heidelberg: Springer-Verlag, 2002, pp. 53–65.
isbn: 3540441794.

[22] Alfred J. Menezes, Paul C. van Oorschot, and
Scott A. Vanstone. Handbook of Applied Cryptog-
raphy. 5th. CRC Press, 2001, p. 251. isbn: 978-
0849385230. url: https://archive.org/details/
handbookofapplie0000mene/page/250/mode/2up.

[23] Ralph Merkle. “Secrecy, authentication, and public key
systems”. PhD thesis. 1979. url: https : / / link .
springer . com / content / pdf / 10 . 1007 / 0 - 387 -

34805-0_21.pdf.

[24] Satoshi Nakamoto. “Bitcoin: A Peer-to-Peer Elec-
tronic Cash System”. In: Cryptography Mailing list at
https://metzdowd.com (Mar. 2009).

[25] Outprog. Storage-based Consensus
Paradigm. url: https : / / mirror . xyz /
0xDc19464589c1cfdD10AEdcC1d09336622b282652 /

KCYNKCIhFvTZ1DmD7IpXr3p8di31ecC283HgMDqasmU.

[26] India Raybould. What is the permaweb? (Permalink).

[27] Lev Berman Sam Williams and Sergii Glushkovskyi.
url: https://github.com/ArweaveTeam/arweave/
commits/master/apps/arweave/src/ar_block.erl.

[28] R.R. Schaller. “Moore’s law: past, present and future”.
In: IEEE Spectrum 34.6 (1997), pp. 52–59. doi: 10.
1109/6.591665.

[29] Jiajie Shen et al. “Understanding I/O Performance of
IPFS Storage: A Client’s Perspective”. In: Proceedings
of the International Symposium on Quality of Service.
IWQoS ’19. Phoenix, Arizona: Association for Com-
puting Machinery, 2019. isbn: 9781450367783. doi:
10.1145/3326285.3329052. url: https://doi.org/
10.1145/3326285.3329052.

[30] Jack Smith. Gateways. https://cookbook.arweave.
dev/concepts/gateways.html.

[31] Arweave Team. SmartWeave. https://github.com/
ArweaveTeam/SmartWeave. June 2022.

[32] Arweave Core Team. Framework For Evolving Ar-
weave. (Permalink).

[33] Arweave Core Team. Principles of the Arweave net-
work. (Permalink).

[34] tevador. RandomX. https://github.com/tevador/
RandomX. Accessed: April 29, 2023. Dec. 2022.

[35] D. Thaler and B. Aboba. RFC 5218 - What Makes
for a Successful Protocol? https://www.rfc-editor.
org/info/rfc5218. Online; accessed 2 May 2023. July
2008.

[36] Understanding the Framework For Evolving Arweave.
(Permalink).

[37] Using Other Currencies. Online; accessed 2 May 2023.
url: https://docs.bundlr.network/sdk/using-
other-currencies.

[38] Sam Williams. Public Square Protocol. url: https:
//github.com/luckyr13/public-square.

[39] Sam Williams and Abhav Kedia. “Fair Forks: Towards
Incentivized Protocol Governance”. In: (Oct. 2022).
(Permalink).

[40] Sam Williams and Dan MacDonald. Permaweb Pay-
ment Protocol. url: https://blog.ar-io.dev.

[41] Anatoly Yakovenko. Solana: A new architecture for
a high-performance blockchain. Tech. rep. 2020. url:
https://solana.com/solana-whitepaper.pdf.

14

https://www.blockchain.com/explorer/charts/hash-rate
https://www.blockchain.com/explorer/charts/hash-rate
https://eprint.iacr.org/2018/601
https://eprint.iacr.org/2018/601
https://eprint.iacr.org/2018/601
https://eprint.iacr.org/2018/601
http://www.bittorrent.org/bittorrentecon.pdf
http://www.bittorrent.org/bittorrentecon.pdf
https://doi.org/https://doi.org/10.1016/S0019-9958(82)90776-8
https://doi.org/https://doi.org/10.1016/S0019-9958(82)90776-8
https://www.sciencedirect.com/science/article/pii/S0019995882907768
https://www.sciencedirect.com/science/article/pii/S0019995882907768
https://www.sciencedirect.com/science/article/pii/S0019995882907768
https://docs.everpay.io/en/docs/guide/dive/deposit
https://docs.everpay.io/en/docs/guide/dive/deposit
https://doi.org/10.1145/3149.214121
https://doi.org/10.1145/3149.214121
https://doi.org/10.1145/3149.214121
https://doi.org/10.1145/3149.214121
https://ylt5tv2hmxhqptyvgnfk664ramtyyrptdwiny5min5zi5ckrn37q.arweave.net/wufZ10dlzwfPFTNKr3uRAyeMRfMdkNx1iG9yjolRbv8
https://www-users.cse.umn.edu/~hoppernj/kad.pdf
https://www-users.cse.umn.edu/~hoppernj/kad.pdf
https://uploads-ssl.webflow.com/632df6381908134a8b796288/63373a98cfa42878438c5449_KwilDB%20White%20Paper.pdf
https://uploads-ssl.webflow.com/632df6381908134a8b796288/63373a98cfa42878438c5449_KwilDB%20White%20Paper.pdf
https://uploads-ssl.webflow.com/632df6381908134a8b796288/63373a98cfa42878438c5449_KwilDB%20White%20Paper.pdf
https://uploads-ssl.webflow.com/632df6381908134a8b796288/63373a98cfa42878438c5449_KwilDB%20White%20Paper.pdf
https://doi.org/10.1002/cpe.5723
https://archive.org/details/handbookofapplie0000mene/page/250/mode/2up
https://archive.org/details/handbookofapplie0000mene/page/250/mode/2up
https://link.springer.com/content/pdf/10.1007/0-387-34805-0_21.pdf
https://link.springer.com/content/pdf/10.1007/0-387-34805-0_21.pdf
https://link.springer.com/content/pdf/10.1007/0-387-34805-0_21.pdf
https://mirror.xyz/0xDc19464589c1cfdD10AEdcC1d09336622b282652/KCYNKCIhFvTZ1DmD7IpXr3p8di31ecC283HgMDqasmU
https://mirror.xyz/0xDc19464589c1cfdD10AEdcC1d09336622b282652/KCYNKCIhFvTZ1DmD7IpXr3p8di31ecC283HgMDqasmU
https://mirror.xyz/0xDc19464589c1cfdD10AEdcC1d09336622b282652/KCYNKCIhFvTZ1DmD7IpXr3p8di31ecC283HgMDqasmU
https://zalcjhqgk4mciidbj6tvalbi4346zwac2rjufv25tlbfeviuljeq.arweave.net/yBYkngZXGCQgYU-nUCwo5vns2ALUU0LXXZrCUlUUWkk
https://github.com/ArweaveTeam/arweave/commits/master/apps/arweave/src/ar_block.erl
https://github.com/ArweaveTeam/arweave/commits/master/apps/arweave/src/ar_block.erl
https://doi.org/10.1109/6.591665
https://doi.org/10.1109/6.591665
https://doi.org/10.1145/3326285.3329052
https://doi.org/10.1145/3326285.3329052
https://doi.org/10.1145/3326285.3329052
https://cookbook.arweave.dev/concepts/gateways.html
https://cookbook.arweave.dev/concepts/gateways.html
https://github.com/ArweaveTeam/SmartWeave
https://github.com/ArweaveTeam/SmartWeave
https://vmqar5ywl4r2qg5hqurmnk2sgfuiqy3roqxwiap3fhrfkmjrceta.arweave.net/qyAI9xZfI6gbp4UixqtSMWiIY3F0L2QB-yniVTExESY
https://22zpxxolm2utk37ovevafqhyrsgptnkos3dahafiabmlrqpa4zsa.arweave.net/1rL73ctmqTVv7qkqAsD4jIz5tU6WxgOAqABYuMHg5mQ
https://github.com/tevador/RandomX
https://github.com/tevador/RandomX
https://www.rfc-editor.org/info/rfc5218
https://www.rfc-editor.org/info/rfc5218
https://keucvwznkccczbu6gcxazh5y54pqcn2dpzpcrcuhve6m42zo4mva.arweave.net/USgq2y1QhCyGnjCuDJ-47x8BN0N-XiiKh6k8zmsu4yo
https://docs.bundlr.network/sdk/using-other-currencies
https://docs.bundlr.network/sdk/using-other-currencies
https://github.com/luckyr13/public-square
https://github.com/luckyr13/public-square
https://s2bhuetdx7njjqg7lxsnzk2xcmvomcwqsmtzjyley4h3jxcx2jxa.arweave.net/loJ6EmO_2pTA313k3KtXEyrmCtCTJ5ThZMcPtNxX0m4
https://blog.ar-io.dev
https://solana.com/solana-whitepaper.pdf

	Introduction
	Protocol Design
	Design principles

	Cryptographic Proofs of Storage
	Block Index
	Merkelization of data
	Succinct Proofs of Access
	SPoA Complexity

	Replica Uniqueness
	Packing
	RandomX

	Verifiable Delay Function
	Succinct Proofs of Replications
	SPoRes Game


	Protocol Construction
	Incentives for Replica Completeness
	Performance Per Packed Partition
	Total Hashrate
	Marginal Partition Efficiency

	Network Metrics
	Incentives to Optimize Data Routing
	Bandwidth Sharing Incentives
	Scalability

	Storage Endowment
	Storage Pricing
	Competition for Storage Efficiency
	Deflation and Endowment Value

	Decentralized Content Policies
	Protocol Evolution
	Mechanism Overview
	Incentives for Cooperation
	Incentives for Dataset Unification

	Conclusion
	Acknowledgements