Ethereum Yellow Paper: a formal specification of Ethereum, a programmable blockchain


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER

SHANGHAI VERSION 9fde3f4 – 2024-09-02

DR. GAVIN WOOD
FOUNDER, ETHEREUM & PARITY

GAVIN@PARITY.IO

Abstract. The blockchain paradigm when coupled with cryptographically-secured transactions has demonstrated its
utility through a number of projects, with Bitcoin being one of the most notable ones. Each such project can be seen as
a simple application on a decentralised, but singleton, compute resource. We can call this paradigm a transactional
singleton machine with shared-state.

Ethereum implements this paradigm in a generalised manner. Furthermore it provides a plurality of such resources,
each with a distinct state and operating code but able to interact through a message-passing framework with others.
We discuss its design, implementation issues, the opportunities it provides and the future hurdles we envisage.

1. Introduction

With ubiquitous internet connections in most places
of the world, global information transmission has become
incredibly cheap. Technology-rooted movements like Bit-
coin have demonstrated through the power of the default,
consensus mechanisms, and voluntary respect of the so-
cial contract, that it is possible to use the internet to
make a decentralised value-transfer system that can be
shared across the world and virtually free to use. This
system can be said to be a very specialised version of a
cryptographically secure, transaction-based state machine.
Follow-up systems such as Namecoin adapted this origi-
nal “currency application” of the technology into other
applications, albeit rather simplistic ones.

Ethereum is a project which attempts to build the gen-
eralised technology; technology on which all transaction-
based state machine concepts may be built. Moreover it
aims to provide to the end-developer a tightly integrated
end-to-end system for building software on a hitherto un-
explored compute paradigm in the mainstream: a trustful
object messaging compute framework.

1.1. Driving Factors. There are many goals of this
project; one key goal is to facilitate transactions between
consenting individuals who would otherwise have no means
to trust one another. This may be due to geographical
separation, interfacing difficulty, or perhaps the incompati-
bility, incompetence, unwillingness, expense, uncertainty,
inconvenience, or corruption of existing legal systems. By
specifying a state-change system through a rich and unam-
biguous language, and furthermore architecting a system
such that we can reasonably expect that an agreement will
be thus enforced autonomously, we can provide a means
to this end.

Dealings in this proposed system would have several
attributes not often found in the real world. The incorrupt-
ibility of judgement, often difficult to find, comes naturally
from a disinterested algorithmic interpreter. Transparency,
or being able to see exactly how a state or judgement came
about through the transaction log and rules or instructional
codes, never happens perfectly in human-based systems
since natural language is necessarily vague, information

is often lacking, and plain old prejudices are difficult to
shake.

Overall, we wish to provide a system such that users
can be guaranteed that no matter with which other indi-
viduals, systems or organisations they interact, they can
do so with absolute confidence in the possible outcomes
and how those outcomes might come about.

1.2. Previous Work. Buterin [2013] first proposed the
kernel of this work in late November, 2013. Though now
evolved in many ways, the key functionality of a block-
chain with a Turing-complete language and an effectively
unlimited inter-transaction storage capability remains un-
changed.

Dwork and Naor [1992] provided the first work into the
usage of a cryptographic proof of computational expendi-
ture (“proof-of-work”) as a means of transmitting a value
signal over the Internet. The value-signal was utilised here
as a spam deterrence mechanism rather than any kind
of currency, but critically demonstrated the potential for
a basic data channel to carry a strong economic signal,
allowing a receiver to make a physical assertion without
having to rely upon trust. Back [2002] later produced a
system in a similar vein.

The first example of utilising the proof-of-work as a
strong economic signal to secure a currency was by Vish-
numurthy et al. [2003]. In this instance, the token was
used to keep peer-to-peer file trading in check, providing
“consumers” with the ability to make micro-payments to
“suppliers” for their services. The security model afforded
by the proof-of-work was augmented with digital signatures
and a ledger in order to ensure that the historical record
couldn’t be corrupted and that malicious actors could not
spoof payment or unjustly complain about service deliv-
ery. Five years later, Nakamoto [2008] introduced another
such proof-of-work-secured value token, somewhat wider in
scope. The fruits of this project, Bitcoin, became the first
widely adopted global decentralised transaction ledger.

Other projects built on Bitcoin’s success; the alt-coins
introduced numerous other currencies through alteration
to the protocol. Some of the best known are Litecoin and
Primecoin, discussed by Sprankel [2013]. Other projects
sought to take the core value content mechanism of the pro-
tocol and repurpose it; Aron [2012] discusses, for example,

1


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 2

the Namecoin project which aims to provide a decentralised
name-resolution system.

Other projects still aim to build upon the Bitcoin net-
work itself, leveraging the large amount of value placed in
the system and the vast amount of computation that goes
into the consensus mechanism. The Mastercoin project,
first proposed by Willett [2013], aims to build a richer
protocol involving many additional high-level features on
top of the Bitcoin protocol through utilisation of a number
of auxiliary parts to the core protocol. The Coloured Coins
project, proposed by Rosenfeld et al. [2012], takes a similar
but more simplified strategy, embellishing the rules of a
transaction in order to break the fungibility of Bitcoin’s
base currency and allow the creation and tracking of tokens
through a special “chroma-wallet”-protocol-aware piece of
software.

Additional work has been done in the area with discard-
ing the decentralisation foundation; Ripple, discussed by
Boutellier and Heinzen [2014], has sought to create a “fed-
erated” system for currency exchange, effectively creating
a new financial clearing system. It has demonstrated that
high efficiency gains can be made if the decentralisation
premise is discarded.

Early work on smart contracts has been done by Szabo
[1997] and Miller [1997]. Around the 1990s it became clear
that algorithmic enforcement of agreements could become a
significant force in human cooperation. Though no specific
system was proposed to implement such a system, it was
proposed that the future of law would be heavily affected
by such systems. In this light, Ethereum may be seen as a
general implementation of such a crypto-law system.

For a list of terms used in this paper, refer to Appen-
dix A.

2. The Blockchain Paradigm

Ethereum, taken as a whole, can be viewed as a
transaction-based state machine: we begin with a gen-
esis state and incrementally execute transactions to morph
it into some current state. It is this current state which we
accept as the canonical “version” of the world of Ethereum.
The state can include such information as account bal-
ances, reputations, trust arrangements, data pertaining
to information of the physical world; in short, anything
that can currently be represented by a computer is admis-
sible. Transactions thus represent a valid arc between two
states; the ‘valid’ part is important—there exist far more
invalid state changes than valid state changes. Invalid state
changes might, e.g., be things such as reducing an account
balance without an equal and opposite increase elsewhere.
A valid state transition is one which comes about through
a transaction. Formally:

(1) σt+1 ≡ Υ(σt,T)

where Υ is the Ethereum state transition function. In
Ethereum, Υ, together with σ are considerably more pow-
erful than any existing comparable system; Υ allows com-
ponents to carry out arbitrary computation, while σ allows
components to store arbitrary state between transactions.

Transactions are collated into blocks; blocks are chained
together using a cryptographic hash as a means of refer-
ence. Blocks function as a journal, recording a series of

transactions together with the previous block and an iden-
tifier for the final state (though do not store the final state
itself—that would be far too big).

Formally, we expand to:

σt+1 ≡ Π(σt,B)(2)
B ≡ (..., (T0,T1, ...), ...)(3)

Π(σ,B) ≡ Υ(Υ(σ,T0),T1)...(4)
Where B is this block, which includes a series of trans-

actions amongst some other components and Π is the
block-level state-transition function for transactions1.

This is the basis of the blockchain paradigm, a model
that forms the backbone of not only Ethereum, but all
decentralised consensus-based transaction systems to date.

2.1. Value. In order to incentivise computation within the
network, there needs to be an agreed method for transmit-
ting value. To address this issue, Ethereum has an intrinsic
currency, Ether, known also as ETH and sometimes referred
to by the Old English D̄. The smallest subdenomination
of Ether, and thus the one in which all integer values of
the currency are counted, is the Wei. One Ether is defined
as being 1018 Wei. There exist other subdenominations of
Ether:

Multiplier Name

100 Wei
109 Gwei

1012 Szabo
1015 Finney
1018 Ether

Throughout the present work, any reference to value,
in the context of Ether, currency, a balance or a payment,
should be assumed to be counted in Wei.

2.2. Which History? Since the system is decentralised
and all parties have an opportunity to create a new block
on some older pre-existing block, the resultant structure is
necessarily a tree of blocks. In order to form a consensus
as to which path, from root (the genesis block) to leaf (the
block containing the most recent transactions) through
this tree structure, known as the blockchain, there must
be an agreed-upon scheme. If there is ever a disagreement
between nodes as to which root-to-leaf path down the block
tree is the ‘best’ blockchain, then a fork occurs.

This would mean that past a given point in time (block),
multiple states of the system may coexist: some nodes be-
lieving one block to contain the canonical transactions,
other nodes believing some other block to be canonical,
potentially containing radically different or incompatible
transactions. This is to be avoided at all costs as the un-
certainty that would ensue would likely kill all confidence
in the entire system.

Since the Paris hard fork, reaching consensus on new
blocks is managed by a protocol called the Beacon Chain.
It is known as the consensus layer of Ethereum, and it
defines the rules for determining the canonical history of
Ethereum blocks. This document describes the execution
layer of Ethereum. The execution layer defines the rules for
interacting with and updating the state of the Ethereum
virtual machine. The consensus layer is described in greater
detail in the consensus specifications. How the consensus

1Note that since the Shanghai fork, blocks also needs to process withdrawal operations in order to reach their final state. Withdrawal
operations are defined in sub-section 4.3, and block final state is discussed in greater detail in section 12.

https://github.com/ethereum/consensus-specs


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 3

layer is used to determine the canonical state of Ethereum
is discussed in section 11.

There are many versions of Ethereum, as the protocol
has undergone a number of updates. These updates can
be specified to occur:

• at a particular block number in the case of pre-
Paris updates,
• after reaching a terminal total difficulty in the case

of the Paris update, or
• at a particular block timestamp in the case of

post-Paris updates.

This document describes the Shanghai version.
In order to follow back the history of a path, one must

reference multiple versions of this document. Here are the
block numbers of protocol updates on the Ethereum main
network:2

Name First Block Number

FHomestead 1150000
FTangerineWhistle 2463000
FSpuriousDragon 2675000
FByzantium 4370000
FConstantinople 7280000
FPetersburg 7280000
FIstanbul 9069000
FMuirGlacier 9200000
FBerlin 12244000
FLondon 12965000
FArrowGlacier 13773000
FGrayGlacier 15050000
FParis 15537394
FShanghai 17034870

Occasionally actors do not agree on a protocol change,
and a permanent fork occurs. In order to distinguish be-
tween diverged blockchains, EIP-155 by Buterin [2016]
introduced the concept of chain ID, which we denote by β.
For the Ethereum main network

(5) β = 1

3. Conventions

We use a number of typographical conventions for the
formal notation, some of which are quite particular to the
present work:

The two sets of highly structured, ‘top-level’, state val-
ues, are denoted with bold lowercase Greek letters. They
fall into those of world-state, which are denoted σ (or a
variant thereupon) and those of machine-state, µ.

Functions operating on highly structured values are
denoted with an upper-case Greek letter, e.g. Υ, the
Ethereum state transition function.

For most functions, an uppercase letter is used, e.g. C,
the general cost function. These may be subscripted to
denote specialised variants, e.g. CSSTORE, the cost func-
tion for the SSTORE operation. For specialised and possibly
externally defined functions, we may format as typewriter
text, e.g. the Keccak-256 hash function (as per version
3 of the winning entry to the SHA-3 contest by Bertoni
et al. [2011], rather than the final SHA-3 specification), is

denoted KEC (and generally referred to as plain Keccak).
Also, KEC512 refers to the Keccak-512 hash function.

Tuples are typically denoted with an upper-case letter,
e.g. T, is used to denote an Ethereum transaction. This
symbol may, if accordingly defined, be subscripted to refer
to an individual component, e.g. Tn, denotes the nonce
of said transaction. The form of the subscript is used to
denote its type; e.g. uppercase subscripts refer to tuples
with subscriptable components.

Scalars and fixed-size byte sequences (or, synonymously,
arrays) are denoted with a normal lower-case letter, e.g.
n is used in the document to denote a transaction nonce.
Those with a particularly special meaning may be Greek,
e.g. δ, the number of items required on the stack for a
given operation.

Arbitrary-length sequences are typically denoted as a
bold lower-case letter, e.g. o is used to denote the byte
sequence given as the output data of a message call. For
particularly important values, a bold uppercase letter may
be used.

Throughout, we assume scalars are non-negative inte-
gers and thus belong to the set N. The set of all byte
sequences is B, formally defined in Appendix B. If such
a set of sequences is restricted to those of a particular
length, it is denoted with a subscript, thus the set of all
byte sequences of length 32 is named B32 and the set of
all non-negative integers smaller than 2256 is named N256.
This is formally defined in section 4.4.

Square brackets are used to index into and reference
individual components or subsequences of sequences, e.g.
µs[0] denotes the first item on the machine’s stack. For
subsequences, ellipses are used to specify the intended
range, to include elements at both limits, e.g. µm[0..31]
denotes the first 32 items of the machine’s memory.

In the case of the global state σ, which is a sequence of
accounts, themselves tuples, the square brackets are used
to reference an individual account.

When considering variants of existing values, we follow
the rule that within a given scope for definition, if we
assume that the unmodified ‘input’ value be denoted by
the placeholder � then the modified and utilisable value is
denoted as �′, and intermediate values would be �∗, �∗∗

&c. On very particular occasions, in order to maximise
readability and only if unambiguous in meaning, we may
use alpha-numeric subscripts to denote intermediate values,
especially those of particular note.

When considering the use of existing functions, given a
function f, the function f∗ denotes a similar, element-wise
version of the function mapping instead between sequences.
It is formally defined in section 4.4.

We define a number of useful functions throughout. One
of the more common is `, which evaluates to the last item
in the given sequence:

(6) `(x) ≡ x[‖x‖− 1]

4. Blocks, State and Transactions

Having introduced the basic concepts behind Ethereum,
we will discuss the meaning of a transaction, a block and
the state in more detail.

2Note that while the Paris, Shanghai, and every upcoming forks activated at a given block number (e.g. 15,537,394 for Paris), the trigger
was not the block number, but rather reaching a specified timestamp (or total difficulty for Paris). The trigger for the Paris and subsequent
hard fork are discussed in greater detail in section 10.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 4

4.1. World State. The world state (state), is a map-
ping between addresses (160-bit identifiers) and account
states (a data structure serialised as RLP, see Appendix B).
Though not stored on the blockchain, it is assumed that
the implementation will maintain this mapping in a modi-
fied Merkle Patricia tree (trie, see Appendix D). The trie
requires a simple database backend that maintains a map-
ping of byte arrays to byte arrays; we name this underlying
database the state database. This has a number of benefits;
firstly the root node of this structure is cryptographically
dependent on all internal data and as such its hash can
be used as a secure identity for the entire system state.
Secondly, being an immutable data structure, it allows any
previous state (whose root hash is known) to be recalled
by simply altering the root hash accordingly. Since we
store all such root hashes in the blockchain, we are able to
trivially revert to old states.

The account state, σ[a], comprises the following four
fields:

nonce: A scalar value equal to the number of trans-
actions sent from this address or, in the case
of accounts with associated code, the number of
contract-creations made by this account. For ac-
count of address a in state σ, this would be for-
mally denoted σ[a]n.

balance: A scalar value equal to the number of Wei
owned by this address. Formally denoted σ[a]b.

storageRoot: A 256-bit hash of the root node of a
Merkle Patricia tree that encodes the storage con-
tents of the account (a mapping between 256-bit
integer values), encoded into the trie as a mapping
from the Keccak 256-bit hash of the 256-bit integer
keys to the RLP-encoded 256-bit integer values.
The hash is formally denoted σ[a]s.

codeHash: The hash of the EVM code of this
account—this is the code that gets executed should
this address receive a message call. All such
code fragments are contained in the state data-
base under their corresponding hashes for later
retrieval. This hash is formally denoted σ[a]c, and
thus the code may be denoted as b, given that
KEC(b) = σ[a]c.

Since we typically wish to refer not to the trie’s root
hash but to the underlying set of key/value pairs stored
within, we define a convenient equivalence:

(7) TRIE
(
L
∗
I (σ[a]s)

)
≡ σ[a]s

The collapse function for the set of key/value pairs in
the trie, L∗I , is defined as the element-wise transformation
of the base function LI, given as:

(8) LI
(
(k,v)

)
≡
(
KEC(k),RLP(v)

)
where:

(9) k ∈ B32 ∧ v ∈ N

It shall be understood that σ[a]s is not a ‘physical’
member of the account and does not contribute to its later
serialisation.

If the codeHash field is the Keccak-256 hash of the
empty string, i.e. σ[a]c = KEC

(
()
)
, then the node represents

a simple account, sometimes referred to as a “non-contract”
account.

Thus we may define a world-state collapse function LS:

(10) LS(σ) ≡{p(a) : σ[a] 6= ∅}

where

(11) p(a) ≡
(
KEC(a),RLP

(
(σ[a]n,σ[a]b,σ[a]s,σ[a]c)

))
This function, LS, is used alongside the trie function

to provide a short identity (hash) of the world state. We
assume:

(12) ∀a : σ[a] = ∅ ∨ (a ∈ B20 ∧ v(σ[a]))

where v is the account validity function:

(13) v(x) ≡ xn ∈ N256∧xb ∈ N256∧xs ∈ B32∧xc ∈ B32

An account is empty when it has no code, zero nonce
and zero balance:
(14)
EMPTY(σ,a) ≡ σ[a]c = KEC

(
()
)
∧σ[a]n = 0∧σ[a]b = 0

Even callable precompiled contracts can have an empty
account state. This is because their account states do not
usually contain the code describing its behavior.

An account is dead when its account state is non-existent
or empty:

(15) DEAD(σ,a) ≡ σ[a] = ∅∨EMPTY(σ,a)

4.2. The Transaction. A transaction (formally, T) is
a single cryptographically-signed instruction constructed
by an actor externally to the scope of Ethereum. The
sender of a transaction cannot be a contract. While it is
assumed that the ultimate external actor will be human
in nature, software tools will be used in its construction
and dissemination3. EIP-2718 by Zoltu [2020] introduced
the notion of different transaction types. As of the London
version of the protocol, there are three transaction types:
0 (legacy), 1 (EIP-2930 by Buterin and Swende [2020b]),
and 2 (EIP-1559 by Buterin et al. [2019]). Further, there
are two subtypes of transactions: those which result in
message calls and those which result in the creation of
new accounts with associated code (known informally as
‘contract creation’). All transaction types specify a number
of common fields:

type: EIP-2718 transaction type; formally Tx.
nonce: A scalar value equal to the number of trans-

actions sent by the sender; formally Tn.
gasLimit: A scalar value equal to the maximum

amount of gas that should be used in executing
this transaction. This is paid up-front, before any
computation is done and may not be increased
later; formally Tg.

to: The 160-bit address of the message call’s recipi-
ent or, for a contract creation transaction, ∅, used
here to denote the only member of B0 ; formally
Tt.

value: A scalar value equal to the number of Wei to
be transferred to the message call’s recipient or,
in the case of contract creation, as an endowment
to the newly created account; formally Tv.

3Notably, such ‘tools’ could ultimately become so causally removed from their human-based initiation—or humans may become so
causally-neutral—that there could be a point at which they rightly be considered autonomous agents. e.g. contracts may offer bounties to
humans for being sent transactions to initiate their execution.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 5

r, s: Values corresponding to the signature of the
transaction and used to determine the sender of
the transaction; formally Tr and Ts. This is ex-
panded in Appendix F.

EIP-2930 (type 1) and EIP-1559 (type 2) transactions
also have:

accessList: List of access entries to warm up; for-
mally TA. Each access list entry E is a tuple
of an account address and a list of storage keys:
E ≡ (Ea,Es).

chainId: Chain ID; formally Tc. Must be equal to
the network chain ID β.

yParity: Signature Y parity; formally Ty.

Legacy transactions do not have an accessList (TA =
()), while chainId and yParity for legacy transactions
are combined into a single value:

w: A scalar value encoding Y parity and possi-
bly chain ID; formally Tw. Tw = 27 + Ty or
Tw = 2β + 35 +Ty (see EIP-155 by Buterin [2016]).

There are differences in how one’s acceptable gas price
is specified in type 2 transactions versus type 0 and type 1
transactions. Type 2 transactions take better advantage
of the gas market improvements introduced in EIP-1559
by explicitly limiting the priority fee4 that is paid. Type 2
transactions have the following two fields related to gas:

maxFeePerGas: A scalar value equal to the maxi-
mum number of Wei to be paid per unit of gas for
all computation costs incurred as a result of the
execution of this transaction; formally Tm.

maxPriorityFeePerGas: A scalar value equal to
the maximum number of Wei to be paid to the
block’s fee recipient as an incentive to include the
transaction; formally Tf .

In contrast, type 0 and type 1 transactions specify gas
price as a single value:

gasPrice: A scalar value equal to the number of
Wei to be paid per unit of gas for all computation
costs incurred as a result of the execution of this
transaction; formally Tp.

5

Additionally, a contract creation transaction (regardless
of transaction type) contains:

init: An unlimited size byte array specifying the
EVM-code for the account initialisation procedure,
formally Ti.

init is an EVM-code fragment; it returns the body,
a second fragment of code that executes each time the
account receives a message call (either through a trans-
action or due to the internal execution of code). init is
executed only once at account creation and gets discarded
immediately thereafter.

In contrast, a message call transaction contains:

data: An unlimited size byte array specifying the
input data of the message call, formally Td.

Appendix F specifies the function, S, which maps trans-
actions to the sender, and happens through the ECDSA of
the SECP-256k1 curve, using the hash of the transaction
(excepting the latter three signature fields) as the datum

to sign. For the present we simply assert that the sender
of a given transaction T can be represented with S(T).

(16)

LT(T) ≡




(Tn,Tp,Tg,Tt,Tv, p,Tw,Tr,Ts) if Tx = 0

(Tc,Tn,Tp,Tg,Tt,Tv, p,TA,Ty,Tr,Ts) if Tx = 1

(Tc,Tn,Tf,Tm,Tg,Tt,Tv, p,TA,Ty,Tr,Ts) if Tx = 2

where

(17) p ≡

{
Ti if Tt = ∅
Td otherwise

Here, we assume all components are interpreted by the
RLP as integer values, with the exception of the access list
TA and the arbitrary length byte arrays Ti and Td.
(18)

Tx ∈{0, 1, 2} ∧ Tc = β ∧ Tn ∈ N256 ∧
Tp ∈ N256 ∧ Tg ∈ N256 ∧ Tv ∈ N256 ∧
Tw ∈ N256 ∧ Tr ∈ N256 ∧ Ts ∈ N256 ∧
Ty ∈ N1 ∧ Td ∈ B ∧ Ti ∈ B ∧
Tm ∈ N256 ∧ Tf ∈ N256

where

(19) Nn = {P : P ∈ N∧P < 2n}

The address hash Tt is slightly different: it is either a
20-byte address hash or, in the case of being a contract-
creation transaction (and thus formally equal to ∅), it is
the RLP empty byte sequence and thus the member of B0:

(20) Tt ∈

{
B20 if Tt 6= ∅
B0 otherwise

4.3. The Withdrawal. A withdrawal (formally, W ) is
a tuple of data describing a consensus layer validator’s
withdrawal of some amount of its staked Ether. A with-
drawal is created and validated in the consensus layer of
the blockchain and then pushed to the execution layer. A
withdrawal is composed of the following fields:

globalIndex: zero based incrementing withdrawal
index that acts as a unique identifier for this with-
drawal; formally Wg.

validatorIndex: index of consensus layer’s valida-
tor this withdrawal corresponds to; formally Wv.

recipient: the 20-byte address that will receives
Ether form this withdrawal; formally Wr.

amount: a nonzero amount of Ether denominated
in Gwei (109 Wei); formally Wa.

Withdrawal serialisation is defined as:

(21) LW(W) ≡ (Wg,Wv,Wr,Ta)

Here, we assume all components are interpreted by the
RLP as integer values except for Wr which is a 20-byte
address:

(22) Wg ∈ N64 ∧ Wv ∈ N64 ∧
Wr ∈ B20 ∧ Wa ∈ N64

4The priority fee is discussed in greater detail in sections 5 and 6.
5Type 0 and type 1 transactions will get the same gas price behavior as a type 2 transaction with Tm and Tf set to the value of Tp.
6ommer is a gender-neutral term to mean “sibling of parent”; see https://nonbinary.miraheze.org/wiki/Gender_neutral_language_in_

English#Aunt/Uncle

https://nonbinary.miraheze.org/wiki/Gender_neutral_language_in_English#Aunt/Uncle
https://nonbinary.miraheze.org/wiki/Gender_neutral_language_in_English#Aunt/Uncle


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 6

4.4. The Block. The block in Ethereum is the collec-
tion of relevant pieces of information (known as the block
header ), H, together with information corresponding to
the comprised transactions, T, a now deprecated property
U that prior to the Paris hard fork contained headers
of blocks whose parents were equal to the present block’s
parent’s parent (such blocks were known as ommers6), and,
since the Shanghai hard fork, W, a collection of valida-
tor’s withdrawal pushed by the consensus layer. The block
header contains several pieces of information:

parentHash: The Keccak 256-bit hash of the parent
block’s header, in its entirety; formally Hp.

ommersHash: A 256-bit hash field that is now dep-
recated due to the replacement of proof of work
consensus. It is now to a constant, KEC(RLP(()));
formally Ho.

beneficiary: The 160-bit address to which priority
fees from this block are transferred; formally Hc.

stateRoot: The Keccak 256-bit hash of the root
node of the state trie, after all transactions and
withdrawals are executed and finalisations applied;
formally Hr.

transactionsRoot: The Keccak 256-bit hash of the
root node of the trie structure populated with each
transaction in the transactions list portion of the
block; formally Ht.

receiptsRoot: The Keccak 256-bit hash of the root
node of the trie structure populated with the re-
ceipts of each transaction in the transactions list
portion of the block; formally He.

logsBloom: The Bloom filter composed from index-
able information (logger address and log topics)
contained in each log entry from the receipt of
each transaction in the transactions list; formally
Hb.

difficulty: A scalar field that is now deprecated due
to the replacement of proof of work consensus. It
is set to 0; formally Hd.

number: A scalar value equal to the number of an-
cestor blocks. The genesis block has a number of
zero; formally Hi.

gasLimit: A scalar value equal to the current limit
of gas expenditure per block; formally Hl.

gasUsed: A scalar value equal to the total gas used
in transactions in this block; formally Hg.

timestamp: A scalar value equal to the reasonable
output of Unix’s time() at this block’s inception;
formally Hs.

extraData: An arbitrary byte array containing data
relevant to this block. This must be 32 bytes or
fewer; formally Hx.

prevRandao: the latest RANDAO mix7 of the post
beacon state of the previous block; formally Ha.

nonce: A 64-bit value that is now deprecated due
to the replacement of proof of work consensus. It
is set to 0x0000000000000000; formally Hn.

baseFeePerGas: A scalar value equal to the
amount of wei that is burned for each unit of
gas consumed; formally Hf .

withdrawalsRoot: The Keccak 256-bit hash of the
root node of the trie structure populated with each
withdrawal operations pushed by the consensus
layer for this block; formally Hw.

The other three components in the block are a series of
transactions, BT, an empty array which was previously
reserved for ommer block headers, BU, and a series of
withdrawals, BW. Formally, we can refer to a block B:

(23) B ≡ (BH,BT,BU,BW)

4.4.1. Transaction Receipt. In order to encode information
about a transaction concerning which it may be useful
to form a zero-knowledge proof, or index and search, we
encode a receipt of each transaction containing certain in-
formation from its execution. Each receipt, denoted BR[i]
for the ith transaction, is placed in an index-keyed trie
and the root recorded in the header as He.

The transaction receipt, R, is a tuple of five items com-
prising: the type of the transaction, Rx, the status code of
the transaction, Rz, the cumulative gas used in the block
containing the transaction receipt as of immediately after
the transaction has happened, Ru, the set of logs created
through execution of the transaction, Rl and the Bloom
filter composed from information in those logs, Rb:

(24) R ≡ (Rx,Rz,Ru,Rb,Rl)

Rx is equal to the type of the corresponding transaction.
The function LR prepares a transaction receipt for being

transformed into an RLP-serialised byte array:

(25) LR(R) ≡ (Rz,Ru,Rb,Rl)

We assert that the status code Rz is a non-negative
integer:

(26) Rz ∈ N

We assert that Ru, the cumulative gas used, is a non-
negative integer and that the logs Bloom, Rb, is a hash of
size 2048 bits (256 bytes):

(27) Ru ∈ N ∧ Rb ∈ B256

The sequence Rl is a series of log entries, (O0,O1, ...).
A log entry, O, is a tuple of the logger’s address, Oa, a
possibly empty series of 32-byte log topics, Ot and some
number of bytes of data, Od:

(28) O ≡ (Oa, (Ot0,Ot1, ...),Od)

(29) Oa ∈ B20 ∧ ∀x ∈ Ot : x ∈ B32 ∧ Od ∈ B

We define the Bloom filter function, M, to reduce a log
entry into a single 256-byte hash:

(30) M(O) ≡
∨

x∈{Oa}∪Ot
(
M3:2048(x)

)
where M3:2048 is a specialised Bloom filter that sets three
bits out of 2048, given an arbitrary byte sequence. It does
this through taking the low-order 11 bits of each of the
first three pairs of bytes in a Keccak-256 hash of the byte

7RANDAO is a pseudorandom value generated by validators on the Ethereum consensus layer. Refer to the consensus layer specs
(https://github.com/ethereum/consensus-specs) for more detail on RANDAO.

82048 = 211(11 bits), and the low-order 11 bits is the modulo 2048 of the operand, which is in this case is “each of the first three pairs of
bytes in a Keccak-256 hash of the byte sequence.”

https://github.com/ethereum/consensus-specs


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 7

sequence.8 Formally:

M3:2048(x : x ∈ B) ≡ y : y ∈ B256 where:(31)
y = (0, 0, ..., 0) except:(32)

∀i ∈{0, 2, 4} : B2047−m(x,i)(y) = 1(33)
m(x, i) ≡ KEC(x)[i, i + 1] mod 2048(34)

where B is the bit reference function such that Bj(x) equals
the bit of index j (indexed from 0) in the byte array x.
Notably, it treats x as big-endian (more significant bits
will have smaller indices).

4.4.2. Holistic Validity. We can assert a block’s validity
if and only if it satisfies several conditions: the block’s
ommers field BU must be an empty array and the block’s
header must be consistent with the given transactions BT
and withdrawals BW. For the header to be consistent
with the transactions BT and withdrawals BW, state-
Root (Hr) must match the resultant state after executing
all transactions, then all withdrawals, in order on the base
state σ (as specified in section 12), and transactions-
Root (Ht), receiptsRoot (He), logsBloom (Hb) and
withdrawalsRoot (Hw) must be correctly derived from
the transactions themselves, the transaction receipts result-
ing from execution, the resulting logs, and the withdrawals,
respectively.

(35) BU ≡ () ∧
Hr ≡ TRIE(LS(Π(σ,B))) ∧
Ht ≡ TRIE({∀i < ‖BT‖, i ∈ N :

pT(i,BT[i])}) ∧
He ≡ TRIE({∀i < ‖BR‖, i ∈ N :

pR(i,BR[i])}) ∧
Hw ≡ TRIE({∀i < ‖BW‖, i ∈ N :

pW(i,BW[i])}) ∧
Hb ≡

∨
r∈BR

(
rb
)

where pW(k,v) is a pairwise RLP transformation for
withdrawals:

(36) pW(k,W) ≡ (RLP(k),RLP(LW(W)))

similarly, pT(k,v) and pR(k,v) are pairwise RLP trans-
formations, but with a special treatment for EIP-2718
transactions:
(37)

pT(k,T) ≡

(
RLP(k),

{
RLP(LT(T)) if Tx = 0

(Tx) ·RLP(LT(T)) otherwise

)
and
(38)

pR(k,R) ≡

(
RLP(k),

{
RLP(LR(R)) if Rx = 0

(Rx) ·RLP(LR(R)) otherwise

)
(· is the concatenation of byte arrays).

Furthermore:

(39) TRIE(LS(σ)) = P(BH)Hr

Thus TRIE(LS(σ)) is the root node hash of the Merkle
Patricia tree structure containing the key-value pairs of
the state σ with values encoded using RLP, and P (BH) is
the parent block of B, defined directly.

The values stemming from the computation of transac-
tions, specifically the transaction receipts, BR, and that
defined through the transaction’s state-accumulation func-
tion, Π, are formalised later in section 12.3.

4.4.3. Serialisation. The function LB and LH are the prepa-
ration functions for a block and block header respectively.
We assert the types and order of the structure for when
the RLP transformation is required:

LH(H) ≡ ( Hp,Ho,Hc,Hr,Ht,He,Hb,Hd,
Hi,Hl,Hg,Hs,Hx,Ha,Hn,Hf,Hw )

(40)

LB(B) ≡
(
LH(BH), L̃

∗
T(BT),L

∗
H(BU),L

∗
W(BW)

)
(41)

where L̃T takes a special care of EIP-2718 transactions:

(42) L̃T(T) =

{
LT(T) if Tx = 0

(Tx) ·RLP(LT(T)) otherwise

with L̃∗T, L
∗
H, and L

∗
W, being element-wise sequence trans-

formations, thus:
(43)
f
∗(

(x0,x1, ...)
)
≡
(
f(x0),f(x1), ...

)
for any function f

The component types are defined thus:

(44) Hp ∈ B32 ∧ Ho ∈ B32 ∧ Hc ∈ B20 ∧
Hr ∈ B32 ∧ Ht ∈ B32 ∧ He ∈ B32 ∧
Hb ∈ B256 ∧ Hd ∈ N ∧ Hi ∈ N ∧
Hl ∈ N ∧ Hg ∈ N ∧ Hs ∈ N256 ∧
Hx ∈ B ∧ Ha ∈ B32 ∧ Hn ∈ B8 ∧
Hf ∈ N ∧ Hw ∈ B32

where

(45) Bn = {B : B ∈ B∧‖B‖ = n}

We now have a rigorous specification for the construc-
tion of a formal block structure. The RLP function RLP
(see Appendix B) provides the canonical method for trans-
forming this structure into a sequence of bytes ready for
transmission over the wire or storage locally.

4.4.4. Block Header Validity. We define P(BH) to be the
parent block of B, formally:

(46) P(H) ≡ B′ : KEC(RLP(B′H)) = Hp

The block number is the parent’s block number incre-
mented by one:

(47) Hi ≡ P(H)Hi + 1

The London release introduced the block attribute base
fee per gas Hf (see EIP-1559 by Buterin et al. [2019]).
The base fee is the amount of wei burned per unit of gas
consumed while executing transactions within the block.
The value of the base fee is a function of the difference
between the gas used by the parent block and the parent
block’s gas target.

The expected base fee per gas is defined as F(H):

(48) F(H) ≡




1000000000 if Hi = FLondon

P(H)Hf if P(H)Hg = τ

P(H)Hf −ν if P(H)Hg < τ
P(H)Hf + ν if P(H)Hg > τ


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 8

where:

τ ≡b
P(H)Hl

ρ
c(49)

ρ ≡ 2(50)

ν
∗ ≡

{
b
P(H)Hf×(τ−P(H)Hg)

τ
c if P(H)Hg < τ

b
P(H)Hf×(P(H)Hg−τ)

τ
c if P(H)Hg > τ

(51)

ν ≡

{
bν
∗

ξ
c if P(H)Hg < τ

max(bν
∗

ξ
c, 1) if P(H)Hg > τ

(52)

ξ ≡ 8(53)

The gas target, τ, is defined as the gas limit Hl divided
by the elasticity multiplier, ρ, a global constant set to 2. So
while blocks can consume as much gas as the gas limit, the
base fee is adjusted so that on average, blocks consume as
much gas as the gas target. The base fee is increased in the
current block when the parent block’s gas usage exceeds
the gas target, and, conversely, the base fee is decreased
in the current block when the parent block’s gas usage is
less than the gas target.

The magnitude of the increase or decrease in the base
fee, defined as ν, is proportional to the difference between
the amount of gas the parent block consumed and the
parent block’s gas target. The effect on the base fee is
dampened by a global constant called the base fee max
change denominator, formally ξ, set to 8. A value of 8
entails that the base fee can increase or decrease by at
most 12.5% from one block to the next.

The canonical gas limit Hl of a block of header H must
fulfil the relation:

Hl < P(H)Hl′ +

⌊
P(H)Hl′

1024

⌋
∧(54)

Hl > P(H)Hl′ −
⌊
P(H)Hl′

1024

⌋
∧

Hl > 5000

where:

P(H)Hl′ ≡

{
P(H)Hl ×ρ if Hi = FLondon
P(H)Hl if Hi > FLondon

(55)

To avoid a discontinuity in gas usage, the value of the
parent block gas limit for the purpose of validating the
current block’s gas limit is modified at the London fork
block by multiplying it by the elasticity multiplier, ρ. We
call this modified value P(H)Hl′. This ensures that the
gas target for post-London blocks can be set roughly in
line with the gas limit of pre-London blocks.
Hs is the timestamp (in Unix’s time()) of block H and

must fulfil the relation:

(56) Hs > P(H)Hs

The Paris hard fork changed Ethereum’s consensus from
proof of work to proof of stake, and thus deprecated many
block header properties related to proof of work. These
deprecated properties include nonce (Hn), ommersHash
(Ho), difficulty (Hd), and mixHash (Hm).

mixHash has been replaced with a new field prevRan-
dao (Ha). The other header fields related to proof of work
have been replaced with constants:

Ho ≡ KEC(RLP(()))(57)
Hd ≡ 0(58)
Hn ≡ 0x0000000000000000(59)

The value of prevRandao must be determined using
information from the Beacon Chain. While the details of
generating the RANDAO value on the Beacon Chain is
beyond the scope of this paper, we refer to the expected
RANDAO value for the previous block as PREVRANDAO().

Thus we are able to define the block header validity
function V (H):

V (H) ≡ Hg ≤ Hl ∧(60)

Hl < P(H)Hl′ +

⌊
P(H)Hl′

1024

⌋
∧

Hl > P(H)Hl′ −
⌊
P(H)Hl′

1024

⌋
∧

Hl > 5000 ∧
Hs > P(H)Hs ∧
Hi = P(H)Hi + 1 ∧
‖Hx‖≤ 32 ∧
Hf = F(H) ∧
Ho = KEC(RLP(())) ∧
Hd = 0 ∧
Hn = 0x0000000000000000 ∧
Ha = PREVRANDAO()

Note additionally that extraData must be at most 32
bytes.

5. Gas and Payment

In order to avoid issues of network abuse and to sidestep
the inevitable questions stemming from Turing complete-
ness, all programmable computation in Ethereum is subject
to fees. The fee schedule is specified in units of gas (see Ap-
pendix G for the fees associated with various computation).
Thus any given fragment of programmable computation
(this includes creating contracts, making message calls,
utilising and accessing account storage and executing op-
erations on the virtual machine) has a universally agreed
cost in terms of gas.

Every transaction has a specific amount of gas associ-
ated with it: gasLimit. This is the amount of gas which
is implicitly purchased from the sender’s account balance.
The purchase happens at the effective gas price defined
in section 6. The transaction is considered invalid if the
account balance cannot support such a purchase. It is
named gasLimit since any unused gas at the end of the
transaction is refunded (at the same rate of purchase) to
the sender’s account. Gas does not exist outside of the
execution of a transaction. Thus for accounts with trusted
code associated, a relatively high gas limit may be set and
left alone.

Since the introduction of EIP-1559 by Buterin et al.
[2019] in the London hard fork, every transaction included
in a block must pay a base fee, which is specified as wei
per unit of gas consumed and is constant for each trans-
action within a block. The ether that is paid to meet
the base fee is burned (taken out of circulation). The
base fee adjusts dynamically as a function of the previous


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 9

block’s gas consumption relative to its gas target (a value
that is currently half the block’s gas limit, which can be
adjusted by validators). If the previous block’s total gas
consumption exceeds the gas target, this indicates excess
demand for block space at the current base fee, and the
base fee is increased. Conversely, if the gas consumed in
the previous block is lower than the gas target, demand for
block space is lower than the gas target at the current base
fee, and thus the base fee is decreased. This process of
adjusting the base fee should bring the average block’s gas
consumption in line with the gas target. Refer to section
4.4 for greater detail on how the base fee is set.

To incentivize validators to include transactions, there
is an additional fee known as a priority fee, which is also
specified as wei per unit of gas consumed. The total fee
paid by the transactor therefore is the sum of the base
fee per gas and the priority fee per gas multiplied by the
total gas consumed. Ether used to satisfy the priority fee
is delivered to the beneficiary address, the address of an
account typically under the control of the validator.

Transactors using type 2 transactions can specify
the maximum priority fee they are willing to pay
(maxPriorityFeePerGas) as well as the max total fee
they are willing to pay (maxFeePerGas), inclusive of
both the priority fee and the base fee. maxFeePerGas
must be at least as high as the base fee for the transaction
to be included in a block. Type 0 and type 1 transactions
have only one field for specifiying a gas price–gasPrice–
which also must be at least as high as the base fee for
inclusion in a block. The amount by which gasPrice is
higher than the base fee constitutes the priority fee in the
case of a type 0 or type 1 transaction.

Transactors are free to select any priority fee that they
wish, however validators are free to ignore transactions as
they choose. A higher priority fee on a transaction will
therefore cost the sender more in terms of Ether and deliver
a greater value to the validator and thus will more likely
be selected for inclusion. Since there will be a (weighted)
distribution of minimum acceptable priority fees, trans-
actors will necessarily have a trade-off to make between
lowering the priority fee and maximising the chance that
their transaction will be included in a block in a timely
manner.

6. Transaction Execution

The execution of a transaction is the most complex part
of the Ethereum protocol: it defines the state transition
function Υ. It is assumed that any transactions executed
first pass the initial tests of intrinsic validity. These include:

(1) The transaction is well-formed RLP, with no addi-
tional trailing bytes;

(2) the transaction signature is valid;
(3) the transaction nonce is valid (equivalent to the

sender account’s current nonce);
(4) the sender account has no contract code deployed

(see EIP-3607 by Feist et al. [2021]);
(5) the gas limit is no smaller than the intrinsic gas,

g0, used by the transaction;
(6) the sender account balance contains at least the

cost, v0, required in up-front payment;
(7) the maxFeePerGas, Tm, in the case of type 2

transactions, or gasPrice, Tp, in the case of type

0 and type 1 transactions, is greater than or equal
to the block’s base fee, Hf ; and

(8) for type 2 transactions, maxPriorityFeePer-
Gas, Tf , must be no larger than maxFeePerGas,
Tm.

Formally, we consider the function Υ, with T being a
transaction and σ the state:

(61) σ
′

= Υ(σ,T)

Thus σ′ is the post-transactional state. We also define
Υg to evaluate to the amount of gas used in the execution
of a transaction, Υl to evaluate to the transaction’s accrued
log items and Υz to evaluate to the status code resulting
from the transaction. These will be formally defined later.

6.1. Substate. Throughout transaction execution, we ac-
crue certain information that is acted upon immediately
following the transaction. We call this the accrued transac-
tion substate, or accrued substate for short, and represent
it as A, which is a tuple:

(62) A ≡ (As,Al,At,Ar,Aa,AK)

The tuple contents include As, the self-destruct set: a
set of accounts that will be discarded following the trans-
action’s completion. Al is the log series: this is a series of
archived and indexable ‘checkpoints’ in VM code execution
that allow for contract-calls to be easily tracked by onlook-
ers external to the Ethereum world (such as decentralised
application front-ends). At is the set of touched accounts,
of which the empty ones are deleted at the end of a transac-
tion. Ar is the refund balance, increased through using the
SSTORE instruction in order to reset contract storage to
zero from some non-zero value. Though not immediately
refunded, it is allowed to partially offset the total execution
costs. Finally, EIP-2929 by Buterin and Swende [2020a]
introduced Aa, the set of accessed account addresses, and
AK, the set of accessed storage keys (more accurately, each
element of AK is a tuple of a 20-byte account address and
a 32-byte storage slot).

We define the empty accrued substate A0 to have no
self-destructs, no logs, no touched accounts, zero refund bal-
ance, all precompiled contracts in the accessed addresses,
and no accessed storage:

(63) A
0 ≡ (∅, (),∅, 0,π,∅)

where π is the set of all precompiled addresses.

6.2. Execution. We define intrinsic gas g0, the amount of
gas this transaction requires to be paid prior to execution,
as follows:

g0 ≡
∑

i∈Ti,Td

{
Gtxdatazero if i = 0

Gtxdatanonzero otherwise

(64)

+

{
Gtxcreate + R(‖Ti‖) if Tt = ∅
0 otherwise

+ Gtransaction

+

‖TA‖−1∑
j=0

(
Gaccesslistaddress + ‖TA[j]s‖Gaccessliststorage

)
where Ti,Td means the series of bytes of the transaction’s
associated data and initialisation EVM-code, depending on


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 10

whether the transaction is for contract-creation or message-
call. Gtxcreate is added if the transaction is contract-
creating, but not if it is a message call. Gaccesslistaddress and
Gaccessliststorage are the costs of warming up account and
storage access, respectively. G is fully defined in Appendix
G.

We define the initcode cost function, formally R, as
the amount of gas that needs to be paid for each word of
the initcode prior to executing the creation code of a new
contract:

(65) R(x) ≡ Ginitcodeword ×d
x

32
e

We define the effective gas price, formally p, as the
amount of wei the transaction signer will pay per unit of
gas consumed during the transaction’s execution. It is
calculated as follows:

(66) p ≡

{
Tp if Tx = 0 ∨Tx = 1
f + Hf if Tx = 2

where f is the priority fee—the amount of wei the
block’s beneficiary address will receive per unit of gas con-
sumed during the transaction’s execution. It is calculated
as:

(67) f ≡

{
Tp −Hf if Tx = 0 ∨Tx = 1
min(Tf,Tm −Hf ) if Tx = 2

The up-front cost v0 is calculated as:

(68) v0 ≡

{
TgTp + Tv if Tx = 0 ∨Tx = 1
TgTm + Tv if Tx = 2

The validity is determined as:

(69) S(T) 6= ∅ ∧
σ[S(T)]c = KEC

(
()
)
∧

Tn = σ[S(T)]n ∧
g0 6 Tg ∧
v0 6 σ[S(T)]b ∧
m > Hf ∧
n 6 49152 ∧
Tg 6 BHl − `(BR)u

where

(70) m ≡

{
Tp if Tx = 0 ∨Tx = 1
Tm if Tx = 2

and

(71) n ≡

{
‖Ti‖ if Tt 6= ∅
0 otherwise

The penultimate condition ensures that, for create trans-
actions, the length of the initcode is no greater than 49152
bytes. Note the final condition; the sum of the transaction’s
gas limit, Tg, and the gas utilised in this block prior, given
by `(BR)u, must be no greater than the block’s gasLimit,
BHl. Also, with a slight abuse of notation, we assume that
σ[S(T)]c = KEC

(
()
)
, σ[S(T)]n = 0, and σ[S(T)]b = 0 if

σ[S(T)] = ∅.

For type 2 transactions, we add an additional check that
maxPriorityFeePerGas is no larger than maxFeePer-
Gas:

(72) Tm > Tf

The execution of a valid transaction begins with an
irrevocable change made to the state: the nonce of the
account of the sender, S(T ), is incremented by one and the
balance is reduced by part of the up-front cost, Tgp. The
gas available for the proceeding computation, g, is defined
as Tg −g0. The computation, whether contract creation
or a message call, results in an eventual state (which may
legally be equivalent to the current state), the change to
which is deterministic and never invalid: there can be no
invalid transactions from this point.

We define the checkpoint state σ0:

σ0 ≡ σ except:(73)
σ0[S(T)]b ≡ σ[S(T)]b −Tgp(74)
σ0[S(T)]n ≡ σ[S(T)]n + 1(75)

Evaluating σP from σ0 depends on the transaction type;
either contract creation or message call; we define the tuple
of post-execution provisional state σP, remaining gas g

′,
accrued substate A and status code z:
(76)

(σP,g
′
,A,z) ≡




Λ4(σ0,A
∗,S(T),S(T),g,

p,Tv,Ti, 0,∅,>) if Tt = ∅
Θ4(σ0,A

∗,S(T),S(T),Tt,

Tt,g,p,Tv,Tv,Td, 0,>) otherwise

where

A
∗ ≡ A0 except(77)

A
∗
K ≡

⋃
E∈TA

{
∀i < ‖Es‖, i ∈ N : (Ea,Es[i])

}
(78)

A
∗
a ≡

{
a∪Tt if Tt 6= ∅
a otherwise

(79)

a ≡ A0a ∪S(T) ∪Hc ∪E∈TA {Ea}(80)

and g is the amount of gas remaining after deducting
the basic amount required to pay for the existence of the
transaction:

(81) g ≡ Tg −g0
Note we use Θ4 and Λ4 to denote the fact that only the

first four components of the functions’ values are taken;
the final represents the message-call’s output value (a byte
array) and is unused in the context of transaction evalua-
tion.

Then the state is finalised by determining the amount
to be refunded, g∗ from the remaining gas, g′, plus some
allowance from the refund counter, to the sender at the
original rate.

(82) g
∗ ≡ g′ + min

{⌊
Tg −g′

5

⌋
,Ar

}
The total refundable amount is the legitimately remain-

ing gas g′, added to Ar, with the latter component being
capped up to a maximum of one fifth9 (rounded down) of
the total amount used Tg −g′. Therefore, g∗ is the total
gas that remains after the transaction has been executed.

9The max refundable proportion of gas was reduced from one half to one fifth by EIP-3529 by Buterin and Swende [2021] in the London
release


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 11

The validator, whose address is specified as the bene-
ficiary of the present block B, receives the gas consumed
multiplied by the transaction’s priority fee per gas, defined
as f in this section. The ether that is paid by the trans-
actor that goes toward the base fee is debited from the
transactor’s account but credited to no other account, so
it is burned.

We define the pre-final state σ∗ in terms of the provi-
sional state σP:

σ
∗ ≡ σP except(83)

σ
∗
[S(T)]b ≡ σP[S(T)]b + g∗p(84)

σ
∗
[BHc]b ≡ σP[BHc]b + (Tg −g

∗
)f(85)

The final state, σ′, is reached after deleting all accounts
that either appear in the self-destruct set or are touched
and empty:

σ
′ ≡ σ∗ except(86)

∀i ∈ As : σ′[i] = ∅(87)
∀i ∈ At : σ′[i] = ∅ if DEAD(σ∗, i)(88)

And finally, we specify Υg, the total gas used in this
transaction Υl, the logs created by this transaction and
Υz, the status code of this transaction:

Υ
g
(σ,T) ≡ Tg −g∗(89)

Υ
l
(σ,T) ≡ Al(90)

Υ
z
(σ,T) ≡ z(91)

These are used to help define the transaction receipt
and are also used later for state validation.

7. Contract Creation

There are a number of intrinsic parameters used when
creating an account: sender (s), original transactor10 (o),
available gas (g), effective gas price (p), endowment (v)
together with an arbitrary length byte array, i, the ini-
tialisation EVM code, the present depth of the message-
call/contract-creation stack (e), the salt for new account’s
address (ζ) and finally the permission to make modifica-
tions to the state (w). The salt ζ might be missing (ζ = ∅);
formally,

(92) ζ ∈ B32 ∪B0

If the creation was caused by CREATE2, then ζ 6= ∅.
We define the creation function formally as the function

Λ, which evaluates from these values, together with the
state σ and the accrued substate A, to the tuple containing
the new state, remaining gas, new accrued substate, status
code and output (σ′,g′,A′,z, o):

(93) (σ
′
,g
′
,A
′
,z, o) ≡ Λ(σ,A,s,o,g,p,v, i,e,ζ,w)

The address of the new account is defined as being the
rightmost 160 bits of the Keccak-256 hash of the RLP
encoding of the structure containing only the sender and
the account nonce. For CREATE2 the rule is different and
is described in EIP-1014 by Buterin [2018]. Combining
the two cases, we define the resultant address for the new

account a:

a ≡ ADDR(s,σ[s]n − 1,ζ, i)(94)

ADDR(s,n,ζ, i) ≡B96..255
(
KEC
(
LA(s,n,ζ, i)

))
(95)

LA(s,n,ζ, i) ≡

{
RLP
(

(s,n)
)

if ζ = ∅
(255) ·s · ζ ·KEC(i) otherwise

(96)

where · is the concatenation of byte arrays, Ba..b(X) evalu-
ates to a binary value containing the bits of indices in the
range [a,b] of the binary data X, and σ[x] is the address
state of x, or ∅ if none exists. Note we use one fewer than
the sender’s nonce value; we assert that we have incre-
mented the sender account’s nonce prior to this call, and
so the value used is the sender’s nonce at the beginning of
the responsible transaction or VM operation.

The address of the new account is added to the set of
accessed accounts:

(97) A
∗ ≡ A except A∗a ≡ Aa ∪{a}

The account’s nonce is initially defined as one, the bal-
ance as the value passed, the storage as empty and the
code hash as the Keccak 256-bit hash of the empty string;
the sender’s balance is also reduced by the value passed.
Thus the mutated state becomes σ∗:

(98) σ
∗ ≡ σ except:

σ
∗
[a] =

(
1,v + v

′
,TRIE(∅),KEC

(
()
))

(99)

σ
∗
[s] =

{
∅ if σ[s] = ∅ ∧ v = 0
a∗ otherwise

(100)

a
∗ ≡ (σ[s]n,σ[s]b −v,σ[s]s,σ[s]c)(101)

where v′ is the account’s pre-existing value, in the event
it was previously in existence:

(102) v
′ ≡

{
0 if σ[a] = ∅
σ[a]b otherwise

Finally, the account is initialised through the execution
of the initialising EVM code i according to the execution
model (see section 9). Code execution can effect several
events that are not internal to the execution state: the
account’s storage can be altered, further accounts can be
created and further message calls can be made. As such,
the code execution function Ξ evaluates to a tuple of the
resultant state σ∗∗, available gas remaining g∗∗, the re-
sultant accrued substate A∗∗ and the body code of the
account o.

(103) (σ
∗∗
,g
∗∗
,A
∗∗
, o) ≡ Ξ(σ∗,g,A∗,I)

where I contains the parameters of the execution environ-
ment, that is:

10which can differ from the sender in the case of a message call or contract creation not directly triggered by a transaction but coming
from the execution of EVM-code


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 12

Ia ≡ a(104)
Io ≡ o(105)
Ip ≡ p(106)
Id ≡ ()(107)
Is ≡ s(108)
Iv ≡ v(109)
Ib ≡ i(110)
Ie ≡ e(111)
Iw ≡ w(112)

Id evaluates to the empty tuple as there is no input data
to this call. IH has no special treatment and is determined
from the blockchain.

Code execution depletes gas, and gas may not go below
zero, thus execution may exit before the code has come
to a natural halting state. In this (and several other) ex-
ceptional cases we say an out-of-gas (OOG) exception has
occurred: The evaluated state is defined as being the empty
set, ∅, and the entire create operation should have no effect
on the state, effectively leaving it as it was immediately
prior to attempting the creation.

If the initialization code completes successfully, a final
contract-creation cost is paid, the code-deposit cost, c,
proportional to the size of the created contract’s code:

(113) c ≡ Gcodedeposit ×‖o‖

If there is not enough gas remaining to pay this, i.e.
g∗∗ < c, then we also declare an out-of-gas exception.

The gas remaining will be zero in any such exceptional
condition, i.e. if the creation was conducted as the recep-
tion of a transaction, then this doesn’t affect payment of
the intrinsic cost of contract creation; it is paid regardless.
However, the value of the transaction is not transferred
to the aborted contract’s address when we are out-of-gas,
thus the contract’s code is not stored.

If such an exception does not occur, then the remaining
gas is refunded to the originator and the now-altered state
is allowed to persist. Thus formally, we may specify the
resultant state, gas, accrued substate and status code as
(σ′,g′,A′,z) where:

g
′ ≡

{
0 if F

g∗∗ − c otherwise

(114)

σ
′ ≡




σ if F ∨ σ∗∗ = ∅
σ∗∗ except:

σ′[a] = ∅ if DEAD(σ∗∗,a)
σ∗∗ except:

σ′[a]c = KEC(o) otherwise

(115)

A
′ ≡

{
A∗ if F ∨ σ∗∗ = ∅
A∗∗ otherwise

(116)

z ≡

{
0 if F ∨ σ∗∗ = ∅
1 otherwise

(117)

where

F ≡
(
σ[a] 6= ∅ ∧

(
σ[a]c 6= KEC

(
()
)
∨σ[a]n 6= 0

))
∨

(118)

(σ
∗∗

= ∅ ∧ o = ∅) ∨
g
∗∗
< c ∨

‖o‖ > 24576 ∨
o[0] = 0xef

Note the last condition of F indicates that contract
code cannot begin with the byte 0xef (refer to EIP-3541
by Beregszaszi et al. [2021]),

The exception in the determination of σ′ dictates that
o, the resultant byte sequence from the execution of the
initialisation code, specifies the final body code for the
newly-created account.

Note that intention is that the result is either a suc-
cessfully created new contract with its endowment, or no
new contract with no transfer of value. In addition, ob-
serve that if the execution of the initialising code reverts
(σ∗∗ = ∅ ∧ o 6= ∅), the resultant gas g′ is not depleted
(provided there was no other exception), but no new ac-
count is created.

7.1. Subtleties. Note that while the initialisation code
is executing, the newly created address exists but with
no intrinsic body code11. Thus any message call received
by it during this time causes no code to be executed. If
the initialisation execution ends with a SELFDESTRUCT
instruction, the matter is moot since the account will be
deleted before the transaction is completed. For a normal
STOP code, or if the code returned is otherwise empty, then
the state is left with a zombie account, and any remaining
balance will be locked into the account forever.

8. Message Call

In the case of executing a message call, several param-
eters are required: sender (s), transaction originator (o),
recipient (r), the account whose code is to be executed

11During initialization code execution, EXTCODESIZE on the address should return zero, which is the length of the code of the account
while CODESIZE should return the length of the initialization code (as defined in H.2).


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 13

(c, usually the same as recipient), available gas (g), value
(v) and effective gas price (p) together with an arbitrary
length byte array, d, the input data of the call, the present
depth of the message-call/contract-creation stack (e) and
finally the permission to make modifications to the state
(w).

Aside from evaluating to a new state and accrued
transaction substate, message calls also have an extra
component—the output data denoted by the byte array o.
This is ignored when executing transactions, however mes-
sage calls can be initiated due to VM-code execution and
in this case this information is used.
(119)

(σ
′
,g
′
,A
′
,z, o) ≡ Θ(σ,A,s,o,r,c,g,p,v, ṽ, d,e,w)

Note that we need to differentiate between the value that
is to be transferred, v, from the value apparent in the
execution context, ṽ, for the DELEGATECALL instruction.

We define σ1, the first transitional state as the orig-
inal state but with the value transferred from sender to
recipient:

(120) σ1[r]b ≡ σ[r]b + v ∧ σ1[s]b ≡ σ[s]b −v

unless s = r.
Throughout the present work, it is assumed that if σ1[r]

was originally undefined, it will be created as an account
with no code or state and zero balance and nonce. Thus
the previous equation should be taken to mean:

(121) σ1 ≡ σ′1 except:

(122) σ1[s] ≡

{
∅ if σ′1[s] = ∅ ∧ v = 0
a1 otherwise

(123) a1 ≡
(
σ
′
1[s]n,σ

′
1[s]b −v,σ

′
1[s]s,σ

′
1[s]c

)

(124) and σ
′
1 ≡ σ except:

(125)

σ′1[r] ≡ (0,v,TRIE(∅),KEC(())) if σ[r] = ∅∧v 6= 0
σ′1[r] ≡ ∅ if σ[r] = ∅∧v = 0
σ′1[r] ≡ a′1 otherwise

(126) a
′
1 ≡ (σ[r]n,σ[r]b + v,σ[r]s,σ[r]c)

The account’s associated code (identified as the frag-
ment whose Keccak-256 hash is σ[c]c) is executed according
to the execution model (see section 9). Just as with con-
tract creation, if the execution halts in an exceptional
fashion (i.e. due to an exhausted gas supply, stack under-
flow, invalid jump destination or invalid instruction), then
no gas is refunded to the caller and the state is reverted to
the point immediately prior to balance transfer (i.e. σ).

σ
′ ≡

{
σ if σ∗∗ = ∅
σ∗∗ otherwise

(127)

g
′ ≡




0 if σ∗∗ = ∅ ∧
o = ∅

g∗∗ otherwise

(128)

A
′ ≡

{
A if σ∗∗ = ∅
A∗∗ otherwise

(129)

z ≡

{
0 if σ∗∗ = ∅
1 otherwise

(130)

(σ
∗∗
,g
∗∗
,A
∗∗
, o) ≡ Ξ(131)
Ia ≡ r(132)
Io ≡ o(133)
Ip ≡ p(134)
Id ≡ d(135)
Is ≡ s(136)
Iv ≡ ṽ(137)
Ie ≡ e(138)
Iw ≡ w(139)

where

(140) Ξ ≡




ΞECREC(σ1,g,A,I) if c = 1

ΞSHA256(σ1,g,A,I) if c = 2

ΞRIP160(σ1,g,A,I) if c = 3

ΞID(σ1,g,A,I) if c = 4

ΞEXPMOD(σ1,g,A,I) if c = 5

ΞBN ADD(σ1,g,A,I) if c = 6

ΞBN MUL(σ1,g,A,I) if c = 7

ΞSNARKV(σ1,g,A,I) if c = 8

ΞBLAKE2 F(σ1,g,A,I) if c = 9

Ξ(σ1,g,A,I) otherwise

and

(141) KEC(Ib) = σ[c]c

It is assumed that the client will have stored the pair
(KEC(Ib),Ib) at some point prior in order to make the
determination of Ib feasible.

As can be seen, there are nine exceptions to the usage
of the general execution framework Ξ for evaluation of the
message call: these are so-called ‘precompiled’ contracts,
meant as a preliminary piece of architecture that may later
become native extensions. The contracts in addresses 1 to
9 execute the elliptic curve public key recovery function,
the SHA2 256-bit hash scheme, the RIPEMD 160-bit hash
scheme, the identity function, arbitrary precision modular
exponentiation, elliptic curve addition, elliptic curve scalar
multiplication, an elliptic curve pairing check, and the
BLAKE2 compression function F respectively. Their full
formal definition is in Appendix E. We denote the set of
the addresses of the precompiled contracts by π:

(142) π ≡{1, 2, 3, 4, 5, 6, 7, 8, 9}

9. Execution Model

The execution model specifies how the system state is
altered given a series of bytecode instructions and a small
tuple of environmental data. This is specified through a


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 14

formal model of a virtual state machine, known as the
Ethereum Virtual Machine (EVM). It is a quasi-Turing-
complete machine; the quasi qualification comes from the
fact that the computation is intrinsically bounded through
a parameter, gas, which limits the total amount of compu-
tation done.

9.1. Basics. The EVM is a simple stack-based architec-
ture. The word size of the machine (and thus size of stack
items) is 256-bit. This was chosen to facilitate the Keccak-
256 hash scheme and elliptic-curve computations. The
memory model is a simple word-addressed byte array. The
stack has a maximum size of 1024. The machine also has
an independent storage model; this is similar in concept
to the memory but rather than a byte array, it is a word-
addressable word array. Unlike memory, which is volatile,
storage is non volatile and is maintained as part of the
system state. All locations in both storage and memory
are well-defined initially as zero.

The machine does not follow the standard von Neu-
mann architecture. Rather than storing program code
in generally-accessible memory or storage, it is stored
separately in a virtual ROM interactable only through
a specialised instruction.

The machine can have exceptional execution for several
reasons, including stack underflows and invalid instruc-
tions. Like the out-of-gas exception, they do not leave
state changes intact. Rather, the machine halts immedi-
ately and reports the issue to the execution agent (either
the transaction processor or, recursively, the spawning
execution environment) which will deal with it separately.

9.2. Fees Overview. Fees (denominated in gas) are
charged under three distinct circumstances, all three as
prerequisite to the execution of an operation. The first
and most common is the fee intrinsic to the computation
of the operation (see Appendix G). Secondly, gas may be
deducted in order to form the payment for a subordinate
message call or contract creation; this forms part of the
payment for CREATE, CREATE2, CALL and CALLCODE.
Finally, gas may be paid due to an increase in the usage
of the memory.

Over an account’s execution, the total fee for memory-
usage payable is proportional to smallest multiple of 32
bytes that are required such that all memory indices
(whether for read or write) are included in the range. This
is paid for on a just-in-time basis; as such, referencing an
area of memory at least 32 bytes greater than any previ-
ously indexed memory will certainly result in an additional
memory usage fee. Due to this fee it is highly unlikely
addresses will ever go above 32-bit bounds. That said,
implementations must be able to manage this eventuality.

Storage fees have a slightly nuanced behaviour—to in-
centivise minimisation of the use of storage (which corre-
sponds directly to a larger state database on all nodes),
the execution fee for an operation that clears an entry in
the storage is not only waived, a qualified refund is given;
in fact, this refund is effectively paid up-front since the
initial usage of a storage location costs substantially more
than normal usage.

See Appendix H for a rigorous definition of the EVM
gas cost.

9.3. Execution Environment. In addition to the sys-
tem state σ, the remaining gas for computation g, and the
accrued substate A, there are several pieces of important
information used in the execution environment that the
execution agent must provide; these are contained in the
tuple I:

• Ia, the address of the account which owns the code
that is executing.

• Io, the sender address of the transaction that orig-
inated this execution.

• Ip, the price of gas paid by the signer of the transac-
tion that originated this execution. This is defined
as the effective gas price p in section 6.

• Id, the byte array that is the input data to this
execution; if the execution agent is a transaction,
this would be the transaction data.

• Is, the address of the account which caused the
code to be executing; if the execution agent is a
transaction, this would be the transaction sender.

• Iv, the value, in Wei, passed to this account as
part of the same procedure as execution; if the
execution agent is a transaction, this would be the
transaction value.

• Ib, the byte array that is the machine code to be
executed.

• IH, the block header of the present block.
• Ie, the depth of the present message-call or

contract-creation (i.e. the number of CALLs or
CREATE(2)s being executed at present).

• Iw, the permission to make modifications to the
state.

The execution model defines the function Ξ, which can
compute the resultant state σ′, the remaining gas g′, the
resultant accrued substate A′ and the resultant output, o,
given these definitions. For the present context, we will
define it as:

(143) (σ
′
,g
′
,A
′
, o) ≡ Ξ(σ,g,A,I)

where we will remember that A, the accrued substate, is
defined in section 6.1.

9.4. Execution Overview. We must now define the Ξ
function. In most practical implementations this will be
modelled as an iterative progression of the pair comprising
the full system state, σ and the machine state, µ. For-
mally, we define it recursively with a function X. This
uses an iterator function O (which defines the result of a
single cycle of the state machine) together with functions
Z which determines if the present state is an exceptional
halting state of the machine and H, specifying the output
data of the instruction if and only if the present state is a
normal halting state of the machine.

The empty sequence, denoted (), is not equal to the
empty set, denoted ∅; this is important when interpreting
the output of H, which evaluates to ∅ when execution is to
continue but a series (potentially empty) when execution


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 15

should halt.

Ξ(σ,g,A,I) ≡ (σ′,µ′g,A
′
, o)(144)

(σ
′
,µ
′
,A
′
, ..., o) ≡ X

(
(σ,µ,A,I)

)
(145)

µg ≡ g(146)
µpc ≡ 0(147)
µm ≡ (0, 0, ...)(148)
µi ≡ 0(149)
µs ≡ ()(150)
µo ≡ ()(151)

(152)

X
(
(σ,µ,A,I)

)
≡



(
∅,µ,A,I,∅

)
if Z(σ,µ,A,I)(

∅,µ′,A,I, o
)

if w = REVERT

O(σ,µ,A,I) · o if o 6= ∅
X
(
O(σ,µ,A,I)

)
otherwise

where

o ≡ H(µ,I)(153)
(a,b,c,d) ·e ≡ (a,b,c,d,e)(154)

µ
′ ≡ µ except:(155)

µ
′
g ≡ µg −C(σ,µ,A,I)(156)

Note that, when we evaluate Ξ, we drop the fourth
element I′ and extract the remaining gas µ′g from the

resultant machine state µ′.
X is thus cycled (recursively here, but implementations

are generally expected to use a simple iterative loop) until
either Z becomes true indicating that the present state is
exceptional and that the machine must be halted and any
changes discarded or until H becomes a series (rather than
the empty set) indicating that the machine has reached a
controlled halt.

9.4.1. Machine State. The machine state µ is defined as
the tuple (g,pc, m, i, s, o) which are the gas available, the
program counter pc ∈ N256 , the memory contents, the
active number of words in memory (counting continuously
from position 0), the stack contents, and the returndata
buffer. The memory contents µm are a series of zeroes of
size 2256.

For the ease of reading, the instruction mnemonics,
written in small-caps (e.g. ADD), should be interpreted as
their numeric equivalents; the full table of instructions and
their specifics is given in Appendix H.

For the purposes of defining Z, H and O, we define w
as the current operation to be executed:

(157) w ≡

{
Ib[µpc] if µpc < ‖Ib‖
STOP otherwise

We also assume the fixed amounts of δ and α, specifying
the stack items removed and added, both subscriptable
on the instruction and an instruction cost function C eval-
uating to the full cost, in gas, of executing the given
instruction.

9.4.2. Exceptional Halting. The exceptional halting func-
tion Z is defined as:
(158)
Z(σ,µ,A,I) ≡ µg < C(σ,µ,A,I) ∨

δw = ∅ ∨
‖µs‖ < δw ∨
(w = JUMP ∧ µs[0] /∈ D(Ib)) ∨
(w = JUMPI ∧ µs[1] 6= 0 ∧
µs[0] /∈ D(Ib)) ∨

(w = RETURNDATACOPY ∧
µs[1] + µs[2] > ‖µo‖) ∨
‖µs‖− δw + αw > 1024 ∨
(¬Iw ∧ W(w,µ)) ∨
(w = SSTORE ∧ µg 6 Gcallstipend)

where

(159) W(w,µ) ≡ w ∈{CREATE, CREATE2, SSTORE,
SELFDESTRUCT} ∨
LOG0 ≤ w ∧ w ≤ LOG4 ∨
w = CALL ∧ µs[2] 6= 0

This states that the execution is in an exceptional halt-
ing state if there is insufficient gas, if the instruction is
invalid (and therefore its δ subscript is undefined), if there
are insufficient stack items, if a JUMP/JUMPI destination
is invalid, the new stack size would be larger than 1024 or
state modification is attempted during a static call. The
astute reader will realise that this implies that no instruc-
tion can, through its execution, cause an exceptional halt.
Also, the execution is in an exceptional halting state if the
gas left prior to executing an SSTORE instruction is less
than or equal to the call stipend Gcallstipend – see EIP-2200
by Tang [2019] for more information.

9.4.3. Jump Destination Validity. We previously used D
as the function to determine the set of valid jump desti-
nations given the code that is being run. We define this
as any position in the code occupied by a JUMPDEST
instruction.

All such positions must be on valid instruction bound-
aries, rather than sitting in the data portion of PUSH
operations and must appear within the explicitly defined
portion of the code (rather than in the implicitly defined
STOP operations that trail it).

Formally:

(160) D(c) ≡ DJ(c, 0)

where:
(161)

DJ(c, i) ≡



{} if i > ‖c‖
{i}∪DJ(c,N(i, c[i]))

if c[i] = JUMPDEST

DJ(c,N(i, c[i])) otherwise

where N is the next valid instruction position in the
code, skipping the data of a PUSH instruction, if any:
(162)

N(i,w) ≡



i + w − PUSH1 + 2

if w ∈ [PUSH1, PUSH32]
i + 1 otherwise


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 16

9.4.4. Normal Halting. The normal halting function H is
defined:
(163)

H(µ,I) ≡



HRETURN(µ) if w ∈{RETURN, REVERT}
() if w ∈{STOP, SELFDESTRUCT}
∅ otherwise

The data-returning halt operations, RETURN and
REVERT, have a special function HRETURN. Note also
the difference between the empty sequence and the empty
set as discussed here.

9.5. The Execution Cycle. Stack items are added or
removed from the left-most, lower-indexed portion of the
series; all other items remain unchanged:

O
(
(σ,µ,A,I)

)
≡ (σ′,µ′,A′,I)(164)

∆ ≡ αw − δw(165)
‖µ′s‖ ≡ ‖µs‖ + ∆(166)

∀x ∈ [αw,‖µ′s‖) : µ
′
s[x] ≡ µs[x− ∆](167)

The gas is reduced by the instruction’s gas cost and
for most instructions, the program counter increments on
each cycle, for the three exceptions, we assume a function
J, subscripted by one of two instructions, which evaluates
to the according value:

µ
′
g ≡ µg −C(σ,µ,A,I)(168)

µ
′
pc ≡



JJUMP(µ) if w = JUMP

JJUMPI(µ) if w = JUMPI

N(µpc,w) otherwise

(169)

In general, we assume the memory, accrued substate
and system state do not change:

µ
′
m ≡ µm(170)
µ
′
i ≡ µi(171)

A
′ ≡ A(172)
σ
′ ≡ σ(173)

However, instructions do typically alter one or several
components of these values. Altered components listed by
instruction are noted in Appendix H, alongside values for
α and δ and a formal description of the gas requirements.

10. Transition to Proof of Stake

The Paris hard fork changed the underlying consensus
mechanism of Ethereum from proof of work to proof of
stake.

Unlike all previous hard forks of Ethereum, Paris was
not defined to occur at a particular block height, but rather
after a specified terminal total difficulty was reached. To-
tal difficulty was used instead of block height to avoid a
scenario in which a minority of hash power could create a
malicious fork that could race to satisfy the block height
requirement and claim the first proof of stake block.

Thus the terminal block, the last proof of work block
before the Paris fork takes effect, is defined as having:

Bt > 58750000000000000000000(174)

P(BH)t < 58750000000000000000000(175)

where Bt is the total difficulty of block B and P(BH)t
is the total difficulty of its parent.

Total difficulty for a proof of work (pre-Paris ) block B
is defined recursively as:

Bt ≡ P(BH)t + Hd(176)

where Hd is the difficulty of the current block B.
Upon reaching the terminal block, new blocks are pro-

cessed by the Beacon Chain.

10.1. Post-Paris Updates. Because the Beacon Chain
generate a new slot every 12 seconds, post-Paris updates
can be scheduled to occur at a specific timestamp. At
the execution layer, the update will then happen in the
first produced block after the scheduled timestamp. For
example the Shanghai hard fork was scheduled to occur at
2023-04-12 10:27:35 UTC, on Epoch 194,048. Validators
failed to propose a block during the two first slots of this
Epoch, but at slot 6,209,538 a validator finally proposed
the block 17,034,870, marking the transition to Shanghai
on the execution layer.

11. Blocktree to Blockchain

Prior to the transition to proof of stake at the Paris
hard fork, the canonical blockchain was defined as the block
path with the greatest total difficulty, defined in section
10 as Bt.

After reaching the terminal block described in section
10, the greatest total difficulty rule must be removed in
favor of a rule known as LMD Ghost.12

Note that in order to determine what blocks comprise
the canonical Ethereum blockchain after Paris, one must
have additional information from the Beacon Chain, which
is not described herein. We denote events emitted by the
Beacon Chain with the prefix POS .

On each occurrence of a POS FORKCHOICE UPDATED event,
starting with the first at the transition block described in
section 10, the canonical chain is defined as the chain be-
ginning with the genesis block and ending at the block
nominated by the event as the head of the chain.

The head of the chain should be updated if and only if
a POS FORKCHOICE UPDATED is emitted, in which case the
head should set to the block specified by the event. No
optimistic updates to the head of the chain should be made.

The POS FORKCHOICE UPDATED event additionally refer-
ences a finalized block. The most recent finalized block
should be set to this block.

The canonical blockchain must also contain a block
with the hash and number of the terminal block defined in
section 10.

12. Block Finalisation

The process of finalising a block involves three stages:

(1) executing withdrawals;
(2) validate transactions;
(3) verify state.

12LMD GHOST comprises two acronyms, “Latest Message Driven”, and “Greedy Heaviest-Observed Sub-Tree”.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 17

12.1. Executing Withdrawals. After processing the
block’s transactions, the withdrawals are executed. A
withdrawal is simply an increase of the recipient account’s
balance of the specified Gwei amount. No other balances
are decreased, a withdrawal is not a transfer but a creation
of funds. A withdrawal operation cannot fail and has no
gas cost. We define the function E as the withdrawal state
transition function:

E(σw,W) ≡ σw+1(177)
σw+1 ≡ σw except:(178)

σw+1[Wr]b ≡ σw[Wr]b + (Wa × 109)(179)

Finally, we define K, the block-level state transition
function for withdrawals:

(180) K(σ,B) ≡ E(E(σ,W0),W1)...

12.2. Transaction Validation. The given gasUsed
must correspond faithfully to the transactions listed: BHg,
the total gas used in the block, must be equal to the
accumulated gas used according to the final transaction:

(181) BHg = `(R)u

12.3. State Validation. We may now define the function,
Γ, that maps a block B to its initiation state:
(182)

Γ(B) ≡

{
σ0 if P(BH) = ∅
σi : TRIE(LS(σi)) = P(BH)Hr otherwise

Here, TRIE(LS(σi)) means the hash of the root node of
a trie of state σi; it is assumed that implementations will
store this in the state database, which is trivial and efficient
since the trie is by nature an immutable data structure.

And finally we define Φ, the block transition function,
which maps an incomplete block B to a complete block
B′:

Φ(B) ≡ B′ : B′ = B except:(183)
B
′
Hr = TRIE(LS(K(Π(Γ(B),B),B)))(184)

As specified at the beginning of the present work, Π is
the state-transition function, which is defined in terms of
Υ, the transaction-evaluation function.

As previously detailed, R[n]z, R[n]l and R[n]u are the
nth corresponding status code, logs and cumulative gas
used after each transaction (R[n]b, the fourth component
in the tuple, has already been defined in terms of the logs).
We also define the nth state σ[n], which is defined simply
as the state resulting from applying the corresponding
transaction to the state resulting from the previous trans-
action (or the block’s initial state in the case of the first
such transaction):

(185) σ[n] =

{
Γ(B) if n < 0

Υ(σ[n− 1],BT[n]) otherwise

In the case of BR[n]u, we take a similar approach defin-
ing each item as the gas used in evaluating the correspond-
ing transaction summed with the previous item (or zero,
if it is the first), giving us a running total:

(186) R[n]u =




0 if n < 0

Υg(σ[n− 1],BT[n])
+R[n− 1]u otherwise

For R[n]l, we utilise the Υ
l function that we conve-

niently defined in the transaction execution function.

(187) R[n]l = Υ
l
(σ[n− 1],BT[n])

We define R[n]z in a similar manner.

(188) R[n]z = Υ
z
(σ[n− 1],BT[n])

Finally, we define Π as the final transaction’s resultant
state, `(σ):

(189) Π(σ,B) ≡ `(σ)

Thus the complete block-transition mechanism (before
consensus) is defined.

13. Implementing Contracts

There are several patterns of contracts engineering that
allow particular useful behaviours; two of these that we
will briefly discuss are data feeds and random numbers.

13.1. Data Feeds. A data feed contract is one which pro-
vides a single service: it gives access to information from
the external world within Ethereum. The accuracy and
timeliness of this information is not guaranteed and it is
the task of a secondary contract author—the contract that
utilises the data feed—to determine how much trust can
be placed in any single data feed.

The general pattern involves a single contract within
Ethereum which, when given a message call, replies with
some timely information concerning an external phenome-
non. An example might be the local temperature of New
York City. This would be implemented as a contract that
returned that value of some known point in storage. Of
course this point in storage must be maintained with the
correct such temperature, and thus the second part of the
pattern would be for an external server to run an Ethereum
node, and immediately on discovery of a new block, creates
a new valid transaction, sent to the contract, updating said
value in storage. The contract’s code would accept such
updates only from the identity contained on said server.

13.2. Random Numbers. Providing random numbers
within a deterministic system is, naturally, an impossible
task. However, we can approximate with pseudo-random
numbers by utilising data which is generally unknowable
at the time of transacting. Such data might include the
block’s hash and the block’s beneficiary address. In order
to make it hard for malicious validators to control those
values, one should use the BLOCKHASH operation in order
to use hashes of the previous 256 blocks as pseudo-random
numbers. For a series of such numbers, a trivial solution
would be to add some constant amount and hashing the
result.

14. Future Directions

The state database won’t be forced to maintain all past
state trie structures into the future. It should maintain
an age for each node and eventually discard nodes that
are neither recent enough nor checkpoints. Checkpoints,
or a set of nodes in the database that allow a particular
block’s state trie to be traversed, could be used to place a
maximum limit on the amount of computation needed in
order to retrieve any state throughout the blockchain.

Blockchain consolidation could be used in order to re-
duce the amount of blocks a client would need to download


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 18

to act as a full node. A compressed archive of the trie
structure at given points in time (perhaps one in every
10,000th block) could be maintained by the peer network,
effectively recasting the genesis block. This would reduce
the amount to be downloaded to a single archive plus a
hard maximum limit of blocks.

Finally, blockchain compression could perhaps be con-
ducted: nodes in state trie that haven’t sent/received a
transaction in some constant amount of blocks could be
thrown out, reducing both Ether-leakage and the growth
of the state database.

15. Conclusion

We have introduced, discussed and formally defined the
protocol of Ethereum. Through this protocol the reader
may implement a node on the Ethereum network and join
others in a decentralised secure social operating system.
Contracts may be authored in order to algorithmically
specify and autonomously enforce rules of interaction.

16. Acknowledgements

Many thanks to Aeron Buchanan for authoring the
Homestead revisions, Christoph Jentzsch for authoring the
Ethash algorithm and Yoichi Hirai for doing most of the
EIP-150 changes. Important maintenance, useful correc-
tions and suggestions were provided by a number of others
from the Ethereum DEV organisation and Ethereum com-
munity at large including Gustav Simonsson, Pawe l Bylica,
Jutta Steiner, Nick Savers, Viktor Trón, Marko Simovic,
Giacomo Tazzari and, of course, Vitalik Buterin.

17. Availability

The source of this paper is maintained at https:
//github.com/ethereum/yellowpaper/. An auto-
generated PDF is located at https://ethereum.github.
io/yellowpaper/paper.pdf.

References

Jacob Aron. BitCoin software finds new life. New Scientist,
213(2847):20, 2012. URL http://www.sciencedirect.
com/science/article/pii/S0262407912601055.

Adam Back. Hashcash - Amortizable Publicly Auditable
Cost-Functions, 2002. URL http://www.hashcash.org/
papers/amortizable.pdf.

Alex Beregszaszi, Pawe l Bylica, Andrei Maiboroda, Alexey
Akhunov, Christian Reitwiessner, and Martin Swende.
EIP-3541: Reject new contract code starting with the
0xef byte, March 2021. URL https://eips.ethereum.
org/EIPS/eip-3541.

Guido Bertoni, Joan Daemen, Michal Peeters, and
Gilles Van Assche. The KECCAK SHA-3 sub-
mission, 2011. URL https://keccak.team/files/
Keccak-submission-3.pdf.

Roman Boutellier and Mareike Heinzen. Pirates, Pio-
neers, Innovators and Imitators. In Growth Through
Innovation, pages 85–96. Springer, 2014. URL https:
//www.springer.com/gb/book/9783319040158.

Vitalik Buterin. Ethereum: A Next-Generation Smart
Contract and Decentralized Application Platform,
2013. URL https://github.com/ethereum/wiki/
wiki/White-Paper.

Vitalik Buterin. EIP-2: Homestead hard-fork changes,
2015. URL https://eips.ethereum.org/EIPS/eip-2.

Vitalik Buterin. EIP-155: Simple replay attack protec-
tion, October 2016. URL https://eips.ethereum.org/
EIPS/eip-155.

Vitalik Buterin. EIP-1014: Skinny CREATE2, April 2018.
URL https://eips.ethereum.org/EIPS/eip-1014.

Vitalik Buterin and Martin Swende. EIP-2929: Gas cost in-
creases for state access opcodes, September 2020a. URL
https://eips.ethereum.org/EIPS/eip-2929.

Vitalik Buterin and Martin Swende. EIP-2930: Optional ac-
cess lists, August 2020b. URL https://eips.ethereum.
org/EIPS/eip-2930.

Vitalik Buterin and Martin Swende. EIP-3529: Reduction
in refunds, April 2021. URL https://eips.ethereum.
org/EIPS/eip-3529.

Vitalik Buterin, Eric Conner, Rick Dudley, Matthew
Slipper, Ian Norden, and Abdelhamid Bakhta. EIP-
1559: Fee market change for eth 1.0 chain, 2019. URL
https://eips.ethereum.org/EIPS/eip-1559.

Nicolas T. Courtois, Marek Grajek, and Rahul Naik.
Optimizing SHA256 in Bitcoin Mining, pages 131–
144. Springer Berlin Heidelberg, Berlin, Heidel-
berg, 2014. ISBN 978-3-662-44893-9. doi: 10.
1007/978-3-662-44893-9 12. URL https://doi.org/10.
1007/978-3-662-44893-9_12.

B.A. Davey and H.A. Priestley. Introduction to lattices
and order. 2nd ed. Cambridge: Cambridge University
Press, 2nd ed. edition, 2002. ISBN 0-521-78451-4/pbk.

Cynthia Dwork and Moni Naor. Pricing via pro-
cessing or combatting junk mail. In In 12th An-
nual International Cryptology Conference, pages 139–
147, 1992. URL http://www.wisdom.weizmann.ac.il/

~naor/PAPERS/pvp.pdf.
Dankrad Feist, Dmitry Khovratovich, and Marius van der

Wijden. EIP-3607: Reject transactions from senders
with deployed code, June 2021. URL https://eips.
ethereum.org/EIPS/eip-3607.

Phong Vo Glenn Fowler, Landon Curt Noll. FowlerNol-
lVo hash function, 1991. URL http://www.isthe.com/
chongo/tech/comp/fnv/index.html.

Nils Gura, Arun Patel, Arvinderpal Wander, Hans Eberle,
and Sheueling Chang Shantz. Comparing elliptic curve
cryptography and RSA on 8-bit CPUs. In Cryptographic
Hardware and Embedded Systems-CHES 2004, pages
119–132. Springer, 2004. URL https://www.iacr.org/
archive/ches2004/31560117/31560117.pdf.

Tjaden Hess, Matt Luongo, Piotr Dyraga, and James
Hancock. EIP-152: Add BLAKE2 compression func-
tion ‘F‘ precompile, October 2016. URL https://eips.
ethereum.org/EIPS/eip-152.

Don Johnson, Alfred Menezes, and Scott Van-
stone. The Elliptic Curve Digital Signa-
ture Algorithm (ECDSA), 2001. URL https:
//web.archive.org/web/20170921160141/http://

cs.ucsb.edu/~koc/ccs130h/notes/ecdsa-cert.pdf.
Accessed 21 September 2017, but the original link was
inaccessible on 19 October 2017. Refer to section 6.2 for
ECDSAPUBKEY, and section 7 for ECDSASIGN and
ECDSARECOVER.

Sergio Demian Lerner. Strict Memory Hard Hashing Func-
tions, 2014. URL http://www.hashcash.org/papers/
memohash.pdf.

https://github.com/ethereum/yellowpaper/
https://github.com/ethereum/yellowpaper/
https://ethereum.github.io/yellowpaper/paper.pdf
https://ethereum.github.io/yellowpaper/paper.pdf
http://www.sciencedirect.com/science/article/pii/S0262407912601055
http://www.sciencedirect.com/science/article/pii/S0262407912601055
http://www.hashcash.org/papers/amortizable.pdf
http://www.hashcash.org/papers/amortizable.pdf
https://eips.ethereum.org/EIPS/eip-3541
https://eips.ethereum.org/EIPS/eip-3541
https://keccak.team/files/Keccak-submission-3.pdf
https://keccak.team/files/Keccak-submission-3.pdf
https://www.springer.com/gb/book/9783319040158
https://www.springer.com/gb/book/9783319040158
https://github.com/ethereum/wiki/wiki/White-Paper
https://github.com/ethereum/wiki/wiki/White-Paper
https://eips.ethereum.org/EIPS/eip-2
https://eips.ethereum.org/EIPS/eip-155
https://eips.ethereum.org/EIPS/eip-155
https://eips.ethereum.org/EIPS/eip-1014
https://eips.ethereum.org/EIPS/eip-2929
https://eips.ethereum.org/EIPS/eip-2930
https://eips.ethereum.org/EIPS/eip-2930
https://eips.ethereum.org/EIPS/eip-3529
https://eips.ethereum.org/EIPS/eip-3529
https://eips.ethereum.org/EIPS/eip-1559
https://doi.org/10.1007/978-3-662-44893-9_12
https://doi.org/10.1007/978-3-662-44893-9_12
http://www.wisdom.weizmann.ac.il/~naor/PAPERS/pvp.pdf
http://www.wisdom.weizmann.ac.il/~naor/PAPERS/pvp.pdf
https://eips.ethereum.org/EIPS/eip-3607
https://eips.ethereum.org/EIPS/eip-3607
http://www.isthe.com/chongo/tech/comp/fnv/index.html
http://www.isthe.com/chongo/tech/comp/fnv/index.html
https://www.iacr.org/archive/ches2004/31560117/31560117.pdf
https://www.iacr.org/archive/ches2004/31560117/31560117.pdf
https://eips.ethereum.org/EIPS/eip-152
https://eips.ethereum.org/EIPS/eip-152
https://web.archive.org/web/20170921160141/http://cs.ucsb.edu/~koc/ccs130h/notes/ecdsa-cert.pdf
https://web.archive.org/web/20170921160141/http://cs.ucsb.edu/~koc/ccs130h/notes/ecdsa-cert.pdf
https://web.archive.org/web/20170921160141/http://cs.ucsb.edu/~koc/ccs130h/notes/ecdsa-cert.pdf
http://www.hashcash.org/papers/memohash.pdf
http://www.hashcash.org/papers/memohash.pdf


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 19

Mark Miller. The Future of Law. In pa-
per delivered at the Extro 3 Conference (August
9), 1997. URL https://drive.google.com/file/d/
0Bw0VXJKBgYPMS0J2VGIyWWlocms/edit?usp=sharing.

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash
system, 2008. URL http://www.bitcoin.org/bitcoin.
pdf.

Meni Rosenfeld, Yoni Assia, Vitalik Buterin, m li-
orhakiLior, Oded Leiba, Assaf Shomer, and Eli-
ran Zach. Colored Coins Protocol Specification,
2012. URL https://github.com/Colored-Coins/
Colored-Coins-Protocol-Specification.

Markku-Juhani Saarinen and Jean-Philippe Aumasson.
RFC 7693: The BLAKE2 cryptographic hash and mes-
sage authentication code (MAC), November 2015. URL
https://tools.ietf.org/html/rfc7693.

Simon Sprankel. Technical Basis of Digital Currencies,
2013. URL http://www.coderblog.de/wp-content/

uploads/technical-basis-of-digital-currencies.

pdf.
Nick Szabo. Formalizing and securing relation-

ships on public networks. First Monday, 2(9),
1997. URL http://firstmonday.org/ojs/index.php/
fm/article/view/548.

Wei Tang. EIP-2200: Structured definitions for net gas me-
tering, 2019. URL https://eips.ethereum.org/EIPS/
eip-2200.

Vivek Vishnumurthy, Sangeeth Chandrakumar, and
Emin Gn Sirer. KARMA: A secure economic framework
for peer-to-peer resource sharing, 2003. URL https://
www.cs.cornell.edu/people/egs/papers/karma.pdf.

J. R. Willett. MasterCoin Complete Specification, 2013.
URL https://github.com/mastercoin-MSC/spec.

Micah Zoltu. EIP-2718: Typed transaction envelope,
June 2020. URL https://eips.ethereum.org/EIPS/
eip-2718.

Appendix A. Terminology

External Actor: A person or other entity able to interface to an Ethereum node, but external to the world of
Ethereum. It can interact with Ethereum through depositing signed Transactions and inspecting the blockchain
and associated state. Has one (or more) intrinsic Accounts.

Address: A 160-bit code used for identifying Accounts.
Account: Accounts have an intrinsic balance and transaction count maintained as part of the Ethereum state.

They also have some (possibly empty) EVM Code and a (possibly empty) Storage State associated with them.
Though homogenous, it makes sense to distinguish between two practical types of account: those with empty
associated EVM Code (thus the account balance is controlled, if at all, by some external entity) and those with
non-empty associated EVM Code (thus the account represents an Autonomous Object). Each Account has a
single Address that identifies it.

Transaction: A piece of data, signed by an External Actor. It represents either a Message or a new Autonomous
Object. Transactions are recorded into each block of the blockchain.

Autonomous Object: A notional object existent only within the hypothetical state of Ethereum. Has an intrinsic
address and thus an associated account; the account will have non-empty associated EVM Code. Incorporated
only as the Storage State of that account.

Storage State: The information particular to a given Account that is maintained between the times that the
Account’s associated EVM Code runs.

Message: Data (as a set of bytes) and Value (specified as Ether) that is passed between two Accounts, either
through the deterministic operation of an Autonomous Object or the cryptographically secure signature of the
Transaction.

Message Call: The act of passing a message from one Account to another. If the destination account is associated
with non-empty EVM Code, then the VM will be started with the state of said Object and the Message acted
upon. If the message sender is an Autonomous Object, then the Call passes any data returned from the VM
operation.

Gas: The fundamental network cost unit. Paid for exclusively by Ether (as of PoC-4), which is converted freely to
and from Gas as required. Gas does not exist outside of the internal Ethereum computation engine; its price is
set by the Transaction and validators are free to ignore Transactions whose Gas price is too low.

Contract: Informal term used to mean both a piece of EVM Code that may be associated with an Account or an
Autonomous Object.

Object: Synonym for Autonomous Object.
App: An end-user-visible application hosted in the Ethereum Browser.
Ethereum Browser: (aka Ethereum Reference Client) A cross-platform GUI of an interface similar to a simplified

browser (a la Chrome) that is able to host sandboxed applications whose backend is purely on the Ethereum
protocol.

Ethereum Virtual Machine: (aka EVM) The virtual machine that forms the key part of the execution model
for an Account’s associated EVM Code.

Ethereum Runtime Environment: (aka ERE) The environment which is provided to an Autonomous Object
executing in the EVM. Includes the EVM but also the structure of the world state on which the EVM relies for
certain I/O instructions including CALL & CREATE.

EVM Code: The bytecode that the EVM can natively execute. Used to formally specify the meaning and
ramifications of a message to an Account.

EVM Assembly: The human-readable form of EVM-code.

https://drive.google.com/file/d/0Bw0VXJKBgYPMS0J2VGIyWWlocms/edit?usp=sharing
https://drive.google.com/file/d/0Bw0VXJKBgYPMS0J2VGIyWWlocms/edit?usp=sharing
http://www.bitcoin.org/bitcoin.pdf
http://www.bitcoin.org/bitcoin.pdf
https://github.com/Colored-Coins/Colored-Coins-Protocol-Specification
https://github.com/Colored-Coins/Colored-Coins-Protocol-Specification
https://tools.ietf.org/html/rfc7693
http://www.coderblog.de/wp-content/uploads/technical-basis-of-digital-currencies.pdf
http://www.coderblog.de/wp-content/uploads/technical-basis-of-digital-currencies.pdf
http://www.coderblog.de/wp-content/uploads/technical-basis-of-digital-currencies.pdf
http://firstmonday.org/ojs/index.php/fm/article/view/548
http://firstmonday.org/ojs/index.php/fm/article/view/548
https://eips.ethereum.org/EIPS/eip-2200
https://eips.ethereum.org/EIPS/eip-2200
https://www.cs.cornell.edu/people/egs/papers/karma.pdf
https://www.cs.cornell.edu/people/egs/papers/karma.pdf
https://github.com/mastercoin-MSC/spec
https://eips.ethereum.org/EIPS/eip-2718
https://eips.ethereum.org/EIPS/eip-2718


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 20

LLL: The Lisp-like Low-level Language, a human-writable language used for authoring simple contracts and general
low-level language toolkit for trans-compiling to.

Appendix B. Recursive Length Prefix

This is a serialisation method for encoding arbitrarily structured binary data (byte arrays).
We define the set of possible structures T:

T ≡ L]B(190)
L ≡ {t : t = (t[0], t[1], ...) ∧ ∀n < ‖t‖ : t[n] ∈ T}(191)
B ≡ {b : b = (b[0], b[1], ...) ∧ ∀n < ‖b‖ : b[n] ∈ O}(192)

Where O is the set of (8-bit) bytes. Thus B is the set of all sequences of bytes (otherwise known as byte arrays, and
a leaf if imagined as a tree), L is the set of all tree-like (sub-)structures that are not a single leaf (a branch node if
imagined as a tree) and T is the set of all byte arrays and such structural sequences. The disjoint union ] is needed only
to distinguish the empty byte array () ∈ B from the empty list () ∈ L, which are encoded differently as defined below; as
common, we will abuse notation and leave the disjoint union indices implicit, inferable from context.

We define the RLP function as RLP through two sub-functions, the first handling the instance when the value is a byte
array, the second when it is a sequence of further values:

(193) RLP(x) ≡

{
Rb(x) if x ∈ B
Rl(x) otherwise

If the value to be serialised is a byte array, the RLP serialisation takes one of three forms:

• If the byte array contains solely a single byte and that single byte is less than 128, then the input is exactly equal
to the output.

• If the byte array contains fewer than 56 bytes, then the output is equal to the input prefixed by the byte equal to
the length of the byte array plus 128.

• Otherwise, the output is equal to the input, provided that it contains fewer than 264 bytes, prefixed by the
minimal-length byte array which when interpreted as a big-endian integer is equal to the length of the input byte
array, which is itself prefixed by the number of bytes required to faithfully encode this length value plus 183.

Byte arrays containing 264 or more bytes cannot be encoded. This restriction ensures that the first byte of the encoding
of a byte array is always below 192, and thus it can be readily distinguished from the encodings of sequences in L.

Formally, we define Rb:

Rb(x) ≡




x if ‖x‖ = 1 ∧ x[0] < 128
(128 + ‖x‖) · x else if ‖x‖ < 56(
183 +

∥∥BE(‖x‖)∥∥) ·BE(‖x‖) · x else if ‖x‖ < 264
∅ otherwise

(194)

BE(x) ≡ (b0,b1, ...) : b0 6= 0 ∧x =
‖b‖−1∑
n=0

bn · 256‖b‖−1−n(195)

(x1, ...,xn) · (y1, ...,ym) = (x1, ...,xn,y1, ...,ym)(196)

Thus BE is the function that expands a non-negative integer value to a big-endian byte array of minimal length and
the dot operator performs sequence concatenation.

If instead, the value to be serialised is a sequence of other items then the RLP serialisation takes one of two forms:

• If the concatenated serialisations of each contained item is less than 56 bytes in length, then the output is equal
to that concatenation prefixed by the byte equal to the length of this byte array plus 192.

• Otherwise, the output is equal to the concatenated serialisations, provided that they contain fewer than 264 bytes,
prefixed by the minimal-length byte array which when interpreted as a big-endian integer is equal to the length
of the concatenated serialisations byte array, which is itself prefixed by the number of bytes required to faithfully
encode this length value plus 247.

Sequences whose concatenated serialized items contain 264 or more bytes cannot be encoded. This restriction ensures
that the first byte of the encoding does not exceed 255 (otherwise it would not be a byte).

Thus we finish by formally defining Rl:

Rl(x) ≡




(192 + ‖s(x)‖) ·s(x) if s(x) 6= ∅∧‖s(x)‖ < 56(
247 +

∥∥BE(‖s(x)‖)∥∥) ·BE(‖s(x)‖) ·s(x) else if s(x) 6= ∅∧‖s(x)‖ < 264
∅ otherwise

(197)

s(x) ≡

{
RLP(x[0]) ·RLP(x[1]) · ... if ∀i : RLP(x[i]) 6= ∅
∅ otherwise

(198)


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 21

If RLP is used to encode a scalar, defined only as a non-negative integer (in N, or in Nx for any x), it must be encoded
as the shortest byte array whose big-endian interpretation is the scalar. Thus the RLP of some non-negative integer i is
defined as:

(199) RLP(i : i ∈ N) ≡ RLP(BE(i))

When interpreting RLP data, if an expected fragment is decoded as a scalar and leading zeroes are found in the byte
sequence, clients are required to consider it non-canonical and treat it in the same manner as otherwise invalid RLP data,
dismissing it completely.

There is no specific canonical encoding format for signed or floating-point values.

Appendix C. Hex-Prefix Encoding

Hex-prefix encoding is an efficient method of encoding an arbitrary number of nibbles as a byte array. It is able to
store an additional flag which, when used in the context of the trie (the only context in which it is used), disambiguates
between node types.

It is defined as the function HP which maps from a sequence of nibbles (represented by the set Y) together with a
boolean value to a sequence of bytes (represented by the set B):

HP(x, t) : x ∈ Y ≡

{
(16f(t), 16x[0] + x[1], 16x[2] + x[3], ...) if ‖x‖ is even
(16(f(t) + 1) + x[0], 16x[1] + x[2], 16x[3] + x[4], ...) otherwise

(200)

f(t) ≡

{
2 if t 6= 0
0 otherwise

(201)

Thus the high nibble of the first byte contains two flags; the lowest bit encoding the oddness of the length and the
second-lowest encoding the flag t. The low nibble of the first byte is zero in the case of an even number of nibbles and the
first nibble in the case of an odd number. All remaining nibbles (now an even number) fit properly into the remaining
bytes.

Appendix D. Modified Merkle Patricia Tree

The modified Merkle Patricia tree (trie) provides a persistent data structure to map between arbitrary-length binary
data (byte arrays). It is defined in terms of a mutable data structure to map between 256-bit binary fragments and
arbitrary-length binary data, typically implemented as a database. The core of the trie, and its sole requirement in terms
of the protocol specification, is to provide a single value that identifies a given set of key-value pairs, which may be either
a 32-byte sequence or the empty byte sequence. It is left as an implementation consideration to store and maintain the
structure of the trie in a manner that allows effective and efficient realisation of the protocol.

Formally, we assume the input value I, a set containing pairs of byte sequences with unique keys:

(202) I = {(k0 ∈ B, v0 ∈ B), (k1 ∈ B, v1 ∈ B), ...}

When considering such a sequence, we use the common numeric subscript notation to refer to a tuple’s key or value,
thus:

(203) ∀I ∈ I : I ≡ (I0,I1)

Any series of bytes may also trivially be viewed as a series of nibbles, given an endian-specific notation; here we assume
big-endian. Thus:

y(I) = {(k′0 ∈ Y, v0 ∈ B), (k
′
1 ∈ Y, v1 ∈ B), ...}(204)

∀n : ∀i < 2‖kn‖ : k′n[i] ≡

{
bkn[i÷ 2] ÷ 16c if i is even
kn[bi÷ 2c] mod 16 otherwise

(205)

We define the function TRIE, which evaluates to the root of the trie that represents this set when encoded in this
structure:

(206) TRIE(I) ≡ KEC
(
RLP(c(I, 0))

)
We also assume a function n, the trie’s node cap function. When composing a node, we use RLP to encode the

structure. As a means of reducing storage complexity, we store nodes whose composed RLP is fewer than 32 bytes directly;
for those larger we assert prescience of the byte array whose Keccak-256 hash evaluates to our reference. Thus we define
in terms of c, the node composition function:

(207) n(I, i) ≡




() ∈ B if I = ∅
c(I, i) if ‖RLP

(
c(I, i)

)
‖ < 32

KEC
(
RLP(c(I, i))

)
otherwise

In a manner similar to a radix tree, when the trie is traversed from root to leaf, one may build a single key-value pair.
The key is accumulated through the traversal, acquiring a single nibble from each branch node (just as with a radix tree).
Unlike a radix tree, in the case of multiple keys sharing the same prefix or in the case of a single key having a unique


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 22

suffix, two optimising nodes are provided. Thus while traversing, one may potentially acquire multiple nibbles from each
of the other two node types, extension and leaf. There are three kinds of nodes in the trie:

Leaf: A two-item structure whose first item corresponds to the nibbles in the key not already accounted for by the
accumulation of keys and branches traversed from the root. The hex-prefix encoding method is used and the
second parameter to the function is required to be 1.

Extension: A two-item structure whose first item corresponds to a series of nibbles of size greater than one that
are shared by at least two distinct keys past the accumulation of the keys of nibbles and the keys of branches as
traversed from the root. The hex-prefix encoding method is used and the second parameter to the function is
required to be 0.

Branch: A 17-item structure whose first sixteen items correspond to each of the sixteen possible nibble values for
the keys at this point in their traversal. The 17th item is used in the case of this being a terminator node and
thus a key being ended at this point in its traversal.

A branch is then only used when necessary; no branch nodes may exist that contain only a single non-zero entry. We
may formally define this structure with the structural composition function c:

(208) c(I, i) ≡




(
HP(I0[i..(‖I0‖− 1)], 1),I1

)
if ‖I‖ = 1 where ∃I : I ∈ I(

HP(I0[i..(j − 1)], 0),n(I,j)
)

if i 6= j where j = max{x : ∃l : ‖l‖ = x∧∀I ∈ I : I0[0..(x− 1)] = l}
(u(0),u(1), ...,u(15),v) otherwise where u(j) ≡ n({I : I ∈ I∧ I0[i] = j}, i + 1)

v =

{
I1 if ∃I : I ∈ I∧‖I0‖ = i
() ∈ B otherwise

D.1. Trie Database. Thus no explicit assumptions are made concerning what data is stored and what is not, since
that is an implementation-specific consideration; we simply define the identity function mapping the key-value set I
to a 32-byte hash and assert that only a single such hash exists for any I, which though not strictly true is accurate
within acceptable precision given the Keccak hash’s collision resistance. In reality, a sensible implementation will not fully
recompute the trie root hash for each set.

A reasonable implementation will maintain a database of nodes determined from the computation of various tries or,
more formally, it will memoise the function c. This strategy uses the nature of the trie to both easily recall the contents of
any previous key-value set and to store multiple such sets in a very efficient manner. Due to the dependency relationship,
Merkle-proofs may be constructed with an O(log N) space requirement that can demonstrate a particular leaf must exist
within a trie of a given root hash.

Appendix E. Precompiled Contracts

For each precompiled contract, we make use of a template function, ΞPRE, which implements the out-of-gas checking.

(209) ΞPRE(σ,g,A,I) ≡

{
(∅, 0,A, ()) if g < gr
(σ,g −gr,A, o) otherwise

The precompiled contracts each use these definitions and provide specifications for the o (the output data) and gr, the
gas requirements.

We define ΞECREC as a precompiled contract for the elliptic curve digital signature algorithm (ECDSA) public key recovery
function (ecrecover). See Appendix F for the definition of the function ECDSARECOVER and the constant secp256k1n. We
also define d to be the input data, well-defined for an infinite length by appending zeroes as required. In the case of an
invalid signature, we return no output.

ΞECREC ≡ ΞPRE where:(210)
gr = 3000(211)

‖o‖ =




0 if v /∈{27, 28} ∨ r = 0 ∨ r ≥ secp256k1n ∨ s = 0 ∨ s ≥ secp256k1n
0 if ECDSARECOVER(h,v − 27,r,s) = ∅
32 otherwise

(212)

if ‖o‖ = 32 :(213)
o[0..11] = 0(214)

o[12..31] = KEC
(
ECDSARECOVER(h,v − 27,r,s)

)
[12..31] where:(215)

d[0..(‖Id‖− 1)] = Id(216)
d[‖Id‖..] = (0, 0, ...)(217)

h = d[0..31](218)

v = d[32..63](219)

r = d[64..95](220)

s = d[96..127](221)


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 23

We define ΞSHA256 and ΞRIP160 as precompiled contracts implementing the SHA2-256 and RIPEMD-160 hash functions
respectively. Their gas usage is dependent on the input data size, a factor rounded up to the nearest number of words.

ΞSHA256 ≡ ΞPRE where:(222)

gr = 60 + 12
⌈‖Id‖

32

⌉
(223)

o[0..31] = SHA256(Id)(224)

ΞRIP160 ≡ ΞPRE where:(225)

gr = 600 + 120
⌈‖Id‖

32

⌉
(226)

o[0..11] = 0(227)

o[12..31] = RIPEMD160(Id)(228)

For the purposes here, we assume we have well-defined standard cryptographic functions for RIPEMD-160 and
SHA2-256 of the form:

SHA256(i ∈ B) ≡ o ∈ B32(229)
RIPEMD160(i ∈ B) ≡ o ∈ B20(230)

The fourth contract, the identity function ΞID simply defines the output as the input:

ΞID ≡ ΞPRE where:(231)

gr = 15 + 3
⌈‖Id‖

32

⌉
(232)

o = Id(233)

The fifth contract performs arbitrary-precision exponentiation under modulo. Here, 00 is taken to be one, and x mod 0
is zero for all x. The first word in the input specifies the number of bytes that the first non-negative integer B occupies.
The second word in the input specifies the number of bytes that the second non-negative integer E occupies. The third
word in the input specifies the number of bytes that the third non-negative integer M occupies. These three words are
followed by B, E and M. The rest of the input is discarded. Whenever the input is too short, the missing bytes are
considered to be zero. The output is encoded big-endian into the same format as M’s.

ΞEXPMOD ≡ ΞPRE except:(234)

gr = max

(
200,

⌊
f
(

max(`M,`B)
)

max(`′E, 1)

Gquaddivisor

⌋)
(235)

Gquaddivisor ≡ 3(236)

f(x) ≡
⌈
x

8

⌉2
(237)

`
′
E =




0 if `E ≤ 32 ∧E = 0
blog2(E)c if `E ≤ 32 ∧E 6= 0
8(`E − 32) + blog2(i[(96 + `B)..(127 + `B)])c if 32 < `E ∧ i[(96 + `B)..(127 + `B)] 6= 0
8(`E − 32) otherwise

(238)

o =
(
B
E

mod M
)
∈ N8`M(239)

`B ≡ i[0..31](240)
`E ≡ i[32..63](241)
`M ≡ i[64..95](242)
B ≡ i[96..(95 + `B)](243)
E ≡ i[(96 + `B)..(95 + `B + `E)](244)
M ≡ i[(96 + `B + `E)..(95 + `B + `E + `M)](245)

i[x] ≡

{
Id[x] if x < ‖Id‖
0 otherwise

(246)

E.1. zkSNARK Related Precompiled Contracts. We choose two numbers, both of which are prime.

p ≡ 21888242871839275222246405745257275088696311157297823662689037894645226208583(247)
q ≡ 21888242871839275222246405745257275088548364400416034343698204186575808495617(248)

Since p is a prime number, {0, 1, . . . ,p− 1} forms a field with addition and multiplication modulo p. We call this field Fp.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 24

We define a set C1 with

(249) C1 ≡{(X,Y ) ∈ Fp ×Fp | Y 2 = X3 + 3}∪{(0, 0)}

We define a binary operation + on C1 for distinct elements (X1,Y1), (X2,Y2) with

(X1,Y1) + (X2,Y2) ≡

{
(X,Y ) if X1 6= X2
(0, 0) otherwise

(250)

λ ≡
Y2 −Y1
X2 −X1

X ≡ λ2 −X1 −X2
Y ≡ λ(X1 −X) −Y1

In the case where (X1,Y1) = (X2,Y2), we define + on C1 with

(X1,Y1) + (X2,Y2) ≡

{
(X,Y ) if Y1 6= 0
(0, 0) otherwise

(251)

λ ≡
3X21
2Y1

X ≡ λ2 − 2X1
Y ≡ λ(X1 −X) −Y1

(C1, +) is known to form a group. We define scalar multiplication · with

(252) n ·P ≡ (0, 0) + P + · · · + P︸ ︷︷ ︸
n

for a natural number n and a point P in C1.
We define P1 to be a point (1, 2) on C1. Let G1 be the subgroup of (C1, +) generated by P1. G1 is known to be a

cyclic group of order q. For a point P in G1, we define logP1 (P ) to be the smallest natural number n satisfying n ·P1 = P .
logP1 (P) is at most q − 1.

Let Fp2 be a field Fp[i]/(i
2 + 1). We define a set C2 with

(253) C2 ≡{(X,Y ) ∈ Fp2 ×Fp2 | Y
2

= X
3

+ 3(i + 9)
−1}∪{(0, 0)}

We define a binary operation + and scalar multiplication · with the same equations (250), (251) and (252). (C2, +) is also
known to be a group. We define P2 in C2 with

P2 ≡ (11559732032986387107991004021392285783925812861821192530917403151452391805634 × i(254)
+10857046999023057135944570762232829481370756359578518086990519993285655852781,

4082367875863433681332203403145435568316851327593401208105741076214120093531 × i
+8495653923123431417604973247489272438418190587263600148770280649306958101930)

We define G2 to be the subgroup of (C2, +) generated by P2. G2 is known to be the only cyclic group of order q on C2.
For a point P in G2, we define logP2 (P) be the smallest natural number n satisfying n ·P2 = P. With this definition,
logP2 (P) is at most q − 1.

Let GT be the multiplicative abelian group underlying Fq12 . It is known that a non-degenerate bilinear map
e : G1 ×G2 → GT exists. This bilinear map is a type three pairing. There are several such bilinear maps, it does not
matter which is chosen to be e. Let PT = e(P1,P2), a be a set of k points in G1, and b be a set of k points in G2. It
follows from the definition of a pairing that the following are equivalent

logP1 (a1) × logP2 (b1) + · · · + logP1 (ak) × logP2 (bk) ≡ 1 mod q(255)
k∏
i=0

e (ai,bi) = PT(256)

Thus the pairing operation provides a method to verify (255).
A 32 byte number x ∈ P256 might and might not represent an element of Fp.

(257) δp(x) ≡

{
x if x < p

∅ otherwise


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 25

A 64 byte data x ∈ B512 might and might not represent an element of G1.

δ1(x) ≡

{
g1 if g1 ∈ G1
∅ otherwise

(258)

g1 ≡

{
(x,y) if x 6= ∅∧y 6= ∅
∅ otherwise

(259)

x ≡ δp(x[0..31])(260)
y ≡ δp(x[32..63])(261)

A 128 byte data x ∈ B1024 might and might not represent an element of G2.

δ2(x) ≡

{
g2 if g2 ∈ G2
∅ otherwise

(262)

g2 ≡

{
((x0i + y0), (x1i + y1)) if x0 6= ∅∧y0 6= ∅∧x1 6= ∅∧y1 6= ∅
∅ otherwise

(263)

x0 ≡ δp(x[0..31])(264)
y0 ≡ δp(x[32..63])(265)
x1 ≡ δp(x[64..95])(266)
y1 ≡ δp(x[96..127])(267)

We define ΞSNARKV as a precompiled contract which checks if (255) holds, for intended use in zkSNARK verification.

ΞSNARKV ≡ ΞPRE except:(268)
ΞSNARKV(σ,g,A,I) = (∅, 0,A, ()) if F(269)

F ≡ (‖Id‖ mod 192 6= 0 ∨ (∃j. aj = ∅∨ bj = ∅))(270)

k =
‖Id‖
192

(271)

gr = 34000k + 45000(272)

o[0..31] ≡

{
0x0000000000000000000000000000000000000000000000000000000000000001 if v ∧¬F
0x0000000000000000000000000000000000000000000000000000000000000000 if ¬v ∧¬F

(273)

v ≡ (logP1 (a1) × logP2 (b1) + · · · + logP1 (ak) × logP2 (bk) ≡ 1 mod q)(274)
a1 ≡ δ1(Id[0..63])(275)
b1 ≡ δ2(Id[64..191])(276)

...

ak ≡ δ1(Id[(‖Id‖− 192)..(‖Id‖− 129)])(277)
bk ≡ δ2(Id[(‖Id‖− 128)..(‖Id‖− 1)])(278)

We define a precompiled contract for addition on G1.

ΞBN ADD ≡ ΞBN PRE except:(279)
ΞBN ADD(σ,g,A,I) = (∅, 0,A, ()) if x = ∅∨y = ∅(280)

gr = 150(281)

o ≡ δ−11 (x + y) where + is the group operation in G1(282)
x ≡ δ1

(
Īd[0..63]

)
(283)

y ≡ δ1
(
Īd[64..127]

)
(284)

Īd[x] ≡

{
Id[x] if x < ‖Id‖
0 otherwise

(285)

We define a precompiled contract for scalar multiplication on G1, where Īd is defined in (285).

ΞBN MUL ≡ ΞPRE except:(286)
ΞBN MUL(σ,g,A,I) = (∅, 0,A, ()) if x = ∅(287)

gr = 6000(288)

o ≡ δ−11 (n ·x) where · is the scalar multiplication in G1(289)
x ≡ δ1

(
Īd[0..63]

)
(290)

n ≡ Īd[64..95](291)


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 26

E.2. BLAKE2 Precompiled Contract. EIP-152 by Hess et al. [2016] defines ΞBLAKE2 F as a precompiled contract
implementing the compression function F used in the BLAKE2 cryptographic hashing algorithm. The F compression
function is specified in RFC 7693 by Saarinen and Aumasson [2015].

ΞBLAKE2 F ≡ ΞPRE except:(292)
ΞBLAKE2 F(σ,g,A,I) = (∅, 0,A, ()) if ‖Id‖ 6= 213 ∨f /∈{0, 1}(293)

gr = r(294)

o ≡ LE8(h′0) · ... ·LE8(h
′
7)(295)

(h
′
0, . . . ,h

′
7) ≡ F(h,m,tlow, thigh,f) with r rounds and w = 64(296)

BE4(r) ≡ Id[0..4](297)
LE8(h0) ≡ Id[4..12](298)

. . .(299)

LE8(h7) ≡ Id[60..68](300)
LE8(m0) ≡ Id[68..76](301)

. . .(302)

LE8(m15) ≡ Id[188..196](303)
LE8(tlow) ≡ Id[196..204](304)
LE8(thigh) ≡ Id[204..212](305)

f ≡ Id[212](306)
where r ∈ B32, ∀i ∈ 0..7 : hi ∈ B64, ∀i ∈ 0..15 : mi ∈ B64, tlow ∈ B64, thigh ∈ B64, f ∈ B8, BEk is the k-byte big-endian
representation—compare with(195):

(307) BEk(x) ≡ (b0,b1, ...,bk−1) : x =
k−1∑
n=0

bn · 256k−1−n

and LEk is the k-byte little-endian representation:

(308) LEk(x) ≡ (b0,b1, ...,bk−1) : x =
k−1∑
n=0

bn · 256n

Appendix F. Signing Transactions

Transactions are signed using recoverable ECDSA signatures. This method utilises the SECP-256k1 curve as described
by Courtois et al. [2014], and is implemented similarly to as described by Gura et al. [2004] on p. 9 of 15, para. 3.

It is assumed that the sender has a valid private key pr, which is a randomly selected positive integer (represented as a
byte array of length 32 in big-endian form) in the range [1,secp256k1n− 1].

We assume the existence of functions ECDSAPUBKEY, ECDSASIGN and ECDSARECOVER. These are formally defined in the
literature, e.g. by Johnson et al. [2001].

ECDSAPUBKEY(pr ∈ B32) ≡ pu ∈ B64(309)
ECDSASIGN(e ∈ B32,pr ∈ B32) ≡ (v ∈ B1,r ∈ B32,s ∈ B32)(310)

ECDSARECOVER(e ∈ B32,v ∈ B1,r ∈ B32,s ∈ B32) ≡ pu ∈ B64(311)
Where pu is the public key, assumed to be a byte array of size 64 (formed from the concatenation of two positive

integers each < 2256), pr is the private key, a byte array of size 32 (or a single positive integer in the aforementioned
range) and e is the hash of the transaction, h(T). It is assumed that v is the ‘recovery identifier’. The recovery identifier
is a 1 byte value specifying the parity and finiteness of the coordinates of the curve point for which r is the x-value; this
value is in the range of [0, 3], however we declare the upper two possibilities, representing infinite values, invalid. The
value 0 represents an even y value and 1 represents an odd y value.

We declare that an ECDSA signature is invalid unless all the following conditions are true:

0 < r < secp256k1n(312)

0 < s < secp256k1n÷ 2 + 1(313)
v ∈{0, 1}(314)

where:

secp256k1n = 115792089237316195423570985008687907852837564279074904382605163141518161494337(315)

Note that this restriction on s is more stringent than restriction 212 in the ΞECREC precompile; see EIP-2 by Buterin [2015]
for more detail.

For a given private key, pr, the Ethereum address A(pr) (a 160-bit value) to which it corresponds is defined as the
rightmost 160-bits of the Keccak-256 hash of the corresponding ECDSA public key:

(316) A(pr) = B96..255
(
KEC
(
ECDSAPUBKEY(pr)

))


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 27

The message hash, h(T), to be signed is the Keccak-256 hash of the transaction. Four different flavours of signing
schemes are available:

LX(T) ≡




(Tn,Tp,Tg,Tt,Tv, p) if Tx = 0 ∧Tw ∈{27, 28}
(Tn,Tp,Tg,Tt,Tv, p,β, (), ()) if Tx = 0 ∧Tw ∈{2β + 35, 2β + 36}
(Tc,Tn,Tp,Tg,Tt,Tv, p,TA) if Tx = 1

(Tc,Tn,Tf,Tm,Tg,Tt,Tv, p,TA) if Tx = 2

(317)

where

p ≡

{
Ti if Tt = ∅
Td otherwise

h(T) ≡

{
KEC(RLP(LX(T))) if Tx = 0

KEC(Tx ·RLP(LX(T))) otherwise
(318)

The signed transaction G(T,pr) is defined as:

G(T,pr) ≡ T except:(319)
(Ty,Tr,Ts) = ECDSASIGN(h(T),pr)(320)

Reiterating from previously:

Tr = r(321)

Ts = s(322)

and Tw of legacy transcations is either 27 + Ty or 2β + 35 + Ty.
We may then define the sender function S of the transaction as:

S(T) ≡ B96..255
(
KEC
(
ECDSARECOVER(h(T),v,Tr,Ts)

))
(323)

v ≡



Tw − 27 if Tx = 0 ∧Tw ∈{27, 28}
(Tw − 35) mod 2 if Tx = 0 ∧Tw ∈{2β + 35, 2β + 36}
Ty if Tx = 1 ∨Tx = 2

(324)

The assertion that the sender of a signed transaction equals the address of the signer should be self-evident:

(325) ∀T : ∀pr : S(G(T,pr)) ≡ A(pr)


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 28

Appendix G. Fee Schedule

The fee schedule G is a tuple of scalar values corresponding to the relative costs, in gas, of a number of abstract
operations that a transaction may effect.

Name Value Description

Gzero 0 Nothing paid for operations of the set Wzero.
Gjumpdest 1 Amount of gas to pay for a JUMPDEST operation.
Gbase 2 Amount of gas to pay for operations of the set Wbase.
Gverylow 3 Amount of gas to pay for operations of the set Wverylow.
Glow 5 Amount of gas to pay for operations of the set Wlow.
Gmid 8 Amount of gas to pay for operations of the set Wmid.
Ghigh 10 Amount of gas to pay for operations of the set Whigh.
Gwarmaccess 100 Cost of a warm account or storage access.
Gaccesslistaddress 2400 Cost of warming up an account with the access list.
Gaccessliststorage 1900 Cost of warming up a storage with the access list.
Gcoldaccountaccess 2600 Cost of a cold account access.
Gcoldsload 2100 Cost of a cold storage access.
Gsset 20000 Paid for an SSTORE operation when the storage value is set to non-zero from zero.
Gsreset 2900 Paid for an SSTORE operation when the storage value’s zeroness remains unchanged or

is set to zero.
Rsclear 4800 Refund given (added into refund counter) when the storage value is set to zero from

non-zero. The refund amount is defined as Gsreset + Gaccessliststorage.
Gselfdestruct 5000 Amount of gas to pay for a SELFDESTRUCT operation.
Gcreate 32000 Paid for a CREATE operation.
Gcodedeposit 200 Paid per byte for a CREATE operation to succeed in placing code into state.
Ginitcodeword 2 Paid per word of the initcode at the beginning of a deploy transaction, a CREATE,

or a CREATE2 operation.
Gcallvalue 9000 Paid for a non-zero value transfer as part of the CALL operation.
Gcallstipend 2300 A stipend for the called contract subtracted from Gcallvalue for a non-zero value transfer.
Gnewaccount 25000 Paid for a CALL or SELFDESTRUCT operation which creates an account.
Gexp 10 Partial payment for an EXP operation.
Gexpbyte 50 Partial payment when multiplied by the number of bytes in the exponent for the EXP operation.
Gmemory 3 Paid for every additional word when expanding memory.
Gtxcreate 32000 Paid by all contract-creating transactions after the Homestead transition.
Gtxdatazero 4 Paid for every zero byte of data or code for a transaction.
Gtxdatanonzero 16 Paid for every non-zero byte of data or code for a transaction.
Gtransaction 21000 Paid for every transaction.
Glog 375 Partial payment for a LOG operation.
Glogdata 8 Paid for each byte in a LOG operation’s data.
Glogtopic 375 Paid for each topic of a LOG operation.
Gkeccak256 30 Paid for each KECCAK256 operation.
Gkeccak256word 6 Paid for each word (rounded up) for input data to a KECCAK256 operation.
Gcopy 3 Partial payment for *COPY operations, multiplied by words copied, rounded up.
Gblockhash 20 Payment for each BLOCKHASH operation.

Appendix H. Virtual Machine Specification

When interpreting 256-bit binary values as integers, the representation is big-endian.
When a 256-bit machine datum is converted to and from a 160-bit address or hash, the rightwards (low-order for BE)

20 bytes are used and the leftmost 12 are discarded or filled with zeroes, thus the integer values (when the bytes are
interpreted as big-endian) are equivalent.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 29

H.1. Gas Cost. The general gas cost function, C, is defined as:

(326)

C(σ,µ,A,I) ≡ Cmem(µ′i)−Cmem(µi) +




CSSTORE(σ,µ,A,I) if w = SSTORE

Gexp if w = EXP ∧µs[1] = 0
Gexp + Gexpbyte × (1 + blog256(µs[1])c) if w = EXP ∧µs[1] > 0
Gverylow + Gcopy ×dµs[2] ÷ 32e if w ∈ Wcopy
Caaccess(µs[0] mod 2

160,A) + Gcopy ×dµs[3] ÷ 32e if w = EXTCODECOPY
Caaccess(µs[0] mod 2

160,A) if w ∈ Wextaccount
Glog + Glogdata ×µs[1] if w = LOG0
Glog + Glogdata ×µs[1] + Glogtopic if w = LOG1
Glog + Glogdata ×µs[1] + 2Glogtopic if w = LOG2
Glog + Glogdata ×µs[1] + 3Glogtopic if w = LOG3
Glog + Glogdata ×µs[1] + 4Glogtopic if w = LOG4
CCALL(σ,µ,A) if w ∈ Wcall
CSELFDESTRUCT(σ,µ) if w = SELFDESTRUCT

Gcreate + R(µs[2]) if w = CREATE

Gcreate + Gkeccak256word ×dµs[2] ÷ 32e + R(µs[2]) if w = CREATE2
Gkeccak256 + Gkeccak256word ×dµs[1] ÷ 32e if w = KECCAK256
Gjumpdest if w = JUMPDEST

CSLOAD(µ,A,I) if w = SLOAD

Gzero if w ∈ Wzero
Gbase if w ∈ Wbase
Gverylow if w ∈ Wverylow
Glow if w ∈ Wlow
Gmid if w ∈ Wmid
Ghigh if w ∈ Whigh
Gblockhash if w = BLOCKHASH

(327) w ≡

{
Ib[µpc] if µpc < ‖Ib‖
STOP otherwise

where:

(328) Cmem(a) ≡ Gmemory ·a +
⌊
a2

512

⌋

(329) Caaccess(x,A) ≡

{
Gwarmaccess if x ∈ Aa
Gcoldaccountaccess otherwise

with CCALL, CSELFDESTRUCT, CSLOAD and CSSTORE as specified in the appropriate section below. We define the following
subsets of instructions:
Wzero = {STOP, RETURN, REVERT}
Wbase = {ADDRESS, ORIGIN, CALLER, CALLVALUE, CALLDATASIZE, CODESIZE, GASPRICE, COINBASE,

TIMESTAMP, NUMBER, PREVRANDAO, GASLIMIT, CHAINID, RETURNDATASIZE, POP, PC, MSIZE, GAS,
BASEFEE, PUSH0}

Wverylow = {ADD, SUB, NOT, LT, GT, SLT, SGT, EQ, ISZERO, AND, OR, XOR, BYTE, SHL, SHR, SAR,
CALLDATALOAD, MLOAD, MSTORE, MSTORE8, PUSH1, ..., PUSH32, DUP*, SWAP*}

Wlow = {MUL, DIV, SDIV, MOD, SMOD, SIGNEXTEND, SELFBALANCE}
Wmid = {ADDMOD, MULMOD, JUMP}
Whigh = {JUMPI}
Wcopy = {CALLDATACOPY, CODECOPY, RETURNDATACOPY}
Wcall = {CALL, CALLCODE, DELEGATECALL, STATICCALL}
Wextaccount = {BALANCE, EXTCODESIZE, EXTCODEHASH}
Note the memory cost component, given as the product of Gmemory and the maximum of 0 & the ceiling of the number

of words in size that the memory must be over the current number of words, µi in order that all accesses reference valid
memory whether for read or write. Such accesses must be for non-zero number of bytes.

Referencing a zero length range (e.g. by attempting to pass it as the input range to a CALL) does not require memory
to be extended to the beginning of the range. µ′i is defined as this new maximum number of words of active memory;
special cases are given where these two are not equal.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 30

Note also that Cmem is the memory cost function (the expansion function being the difference between the cost before
and after). It is a polynomial, with the higher-order coefficient divided and floored, and thus linear up to 704B of memory
used, after which it costs substantially more.

While defining the instruction set, we defined the memory-expansion for range function, M, thus:

(330) M(s,f, l) ≡

{
s if l = 0

max(s,d(f + l) ÷ 32e) otherwise

Another useful function is “all but one 64th” function L defined as:

(331) L(n) ≡ n−bn/64c

H.2. Instruction Set. As previously specified in section 9, these definitions take place in the final context there. In
particular we assume O is the EVM state-progression function and define the terms pertaining to the next cycle’s state
(σ′,µ′) such that:

(332) O(σ,µ,A,I) ≡ (σ′,µ′,A′,I) with exceptions, as noted

Here given are the various exceptions to the state transition rules given in section 9 specified for each instruction,
together with the additional instruction-specific definitions of J and C. For each instruction, also specified is α, the
additional items placed on the stack and δ, the items removed from stack, as defined in section 9.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 31

0s: Stop and Arithmetic Operations
All arithmetic is modulo 2256 unless otherwise noted. The zero-th power of zero 00 is defined to be one.

Value Mnemonic δ α Description

0x00 STOP 0 0 Halts execution.

0x01 ADD 2 1 Addition operation.
µ′s[0] ≡ µs[0] + µs[1]

0x02 MUL 2 1 Multiplication operation.
µ′s[0] ≡ µs[0] ×µs[1]

0x03 SUB 2 1 Subtraction operation.
µ′s[0] ≡ µs[0] −µs[1]

0x04 DIV 2 1 Integer division operation.

µ′s[0] ≡

{
0 if µs[1] = 0

bµs[0] ÷µs[1]c otherwise

0x05 SDIV 2 1 Signed integer division operation (truncated).

µ′s[0] ≡




0 if µs[1] = 0

−2255 if µs[0] = −2
255 ∧ µs[1] = −1

sgn(µs[0] ÷µs[1])b|µs[0] ÷µs[1]|c otherwise
Where all values are treated as two’s complement signed 256-bit integers.
Note the overflow semantic when −2255 is negated.

0x06 MOD 2 1 Modulo remainder operation.

µ′s[0] ≡

{
0 if µs[1] = 0

µs[0] mod µs[1] otherwise

0x07 SMOD 2 1 Signed modulo remainder operation.

µ′s[0] ≡

{
0 if µs[1] = 0

sgn(µs[0])(|µs[0]| mod |µs[1]|) otherwise
Where all values are treated as two’s complement signed 256-bit integers.

0x08 ADDMOD 3 1 Modulo addition operation.

µ′s[0] ≡

{
0 if µs[2] = 0

(µs[0] + µs[1]) mod µs[2] otherwise

All intermediate calculations of this operation are not subject to the 2256

modulo.

0x09 MULMOD 3 1 Modulo multiplication operation.

µ′s[0] ≡

{
0 if µs[2] = 0

(µs[0] ×µs[1]) mod µs[2] otherwise
All intermediate calculations of this operation are not subject to the 2256

modulo.

0x0a EXP 2 1 Exponential operation.

µ′s[0] ≡ µs[0]
µs[1]

0x0b SIGNEXTEND 2 1 Extend length of two’s complement signed integer.

∀i ∈ [0..255] : µ′s[0]i ≡

{
µs[1]t if i 6 t where t = 256 − 8(µs[0] + 1)
µs[1]i otherwise

µs[x]i gives the ith bit (counting from zero) of µs[x]


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 32

10s: Comparison & Bitwise Logic Operations
Value Mnemonic δ α Description

0x10 LT 2 1 Less-than comparison.

µ′s[0] ≡

{
1 if µs[0] < µs[1]

0 otherwise

0x11 GT 2 1 Greater-than comparison.

µ′s[0] ≡

{
1 if µs[0] > µs[1]

0 otherwise

0x12 SLT 2 1 Signed less-than comparison.

µ′s[0] ≡

{
1 if µs[0] < µs[1]

0 otherwise

Where all values are treated as two’s complement signed 256-bit integers.

0x13 SGT 2 1 Signed greater-than comparison.

µ′s[0] ≡

{
1 if µs[0] > µs[1]

0 otherwise

Where all values are treated as two’s complement signed 256-bit integers.

0x14 EQ 2 1 Equality comparison.

µ′s[0] ≡

{
1 if µs[0] = µs[1]

0 otherwise

0x15 ISZERO 1 1 Simple not operator.

µ′s[0] ≡

{
1 if µs[0] = 0

0 otherwise

0x16 AND 2 1 Bitwise AND operation.
∀i ∈ [0..255] : µ′s[0]i ≡ µs[0]i ∧µs[1]i

0x17 OR 2 1 Bitwise OR operation.
∀i ∈ [0..255] : µ′s[0]i ≡ µs[0]i ∨µs[1]i

0x18 XOR 2 1 Bitwise XOR operation.
∀i ∈ [0..255] : µ′s[0]i ≡ µs[0]i ⊕µs[1]i

0x19 NOT 1 1 Bitwise NOT operation.

∀i ∈ [0..255] : µ′s[0]i ≡

{
1 if µs[0]i = 0

0 otherwise

0x1a BYTE 2 1 Retrieve single byte from word.

∀i ∈ [0..255] : µ′s[0]i ≡

{
µs[1](i−248+8µs[0]) if i ≥ 248 ∧µs[0] < 32
0 otherwise

For the Nth byte, we count from the left (i.e. N=0 would be the most significant
in big endian).

0x1b SHL 2 1 Left shift operation.

µ′s[0] ≡ (µs[1] × 2
µs[0]) mod 2256

0x1c SHR 2 1 Logical right shift operation.

µ′s[0] ≡bµs[1] ÷ 2
µs[0]c

0x1d SAR 2 1 Arithmetic (signed) right shift operation.

µ′s[0] ≡bµs[1] ÷ 2
µs[0]c

Where µ′s[0] and µs[1] are treated as two’s complement signed 256-bit integers,
while µs[0] is treated as unsigned.

20s: KECCAK256

Value Mnemonic δ α Description

0x20 KECCAK256 2 1 Compute Keccak-256 hash.
µ′s[0] ≡ KEC(µm[µs[0] . . . (µs[0] + µs[1] − 1)])
µ′i ≡ M(µi,µs[0],µs[1])


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 33

30s: Environmental Information

Value Mnemonic δ α Description

0x30 ADDRESS 0 1 Get address of currently executing account.
µ′s[0] ≡ Ia

0x31 BALANCE 1 1 Get balance of the given account.

µ′s[0] ≡

{
σ[µs[0] mod 2

160]b if σ[µs[0] mod 2
160] 6= ∅

0 otherwise

A′a ≡ Aa ∪{µs[0] mod 2
160}

0x32 ORIGIN 0 1 Get execution origination address.
µ′s[0] ≡ Io
This is the sender of original transaction; it is never an account with
non-empty associated code.

0x33 CALLER 0 1 Get caller address.
µ′s[0] ≡ Is
This is the address of the account that is directly responsible for
this execution.

0x34 CALLVALUE 0 1 Get deposited value by the instruction/transaction responsible for
this execution.
µ′s[0] ≡ Iv

0x35 CALLDATALOAD 1 1 Get input data of current environment.
µ′s[0] ≡ Id[µs[0] . . . (µs[0] + 31)] with Id[x] = 0 if x > ‖Id‖
This pertains to the input data passed with the message call
instruction or transaction.

0x36 CALLDATASIZE 0 1 Get size of input data in current
environment.
µ′s[0] ≡‖Id‖
This pertains to the input data passed with the message call
instruction or transaction.

0x37 CALLDATACOPY 3 0 Copy input data in current environment to memory.

∀i ∈{0 . . .µs[2] − 1} : µ
′
m[µs[0] + i] ≡

{
Id[µs[1] + i] if µs[1] + i < ‖Id‖
0 otherwise

The additions in µs[1] + i are not subject to the 2
256 modulo.

µ′i ≡ M(µi,µs[0],µs[2])
This pertains to the input data passed with the message call instruction
or transaction.

0x38 CODESIZE 0 1 Get size of code running in current environment.
µ′s[0] ≡‖Ib‖

0x39 CODECOPY 3 0 Copy code running in current environment to memory.

∀i ∈{0 . . .µs[2] − 1} : µ
′
m[µs[0] + i] ≡

{
Ib[µs[1] + i] if µs[1] + i < ‖Ib‖
STOP otherwise

µ′i ≡ M(µi,µs[0],µs[2])
The additions in µs[1] + i are not subject to the 2

256 modulo.

0x3a GASPRICE 0 1 Get price of gas in current environment.
This is the effective gas price defined in section 6.
Note that as of the London hard fork, this value no longer
represents what is received by the validator,
but rather just what is paid by the sender.
µ′s[0] ≡ Ip

0x3b EXTCODESIZE 1 1 Get size of an account’s code.

µ′s[0] ≡

{
‖b‖ if σ[µs[0] mod 2

160] 6= ∅
0 otherwise

where KEC(b) ≡ σ[µs[0] mod 2
160]c

A′a ≡ Aa ∪{µs[0] mod 2
160}

0x3c EXTCODECOPY 4 0 Copy an account’s code to memory.

∀i ∈{0 . . .µs[3] − 1} : µ
′
m[µs[1] + i] ≡

{
b[µs[2] + i] if µs[2] + i < ‖b‖
STOP otherwise

where KEC(b) ≡ σ[µs[0] mod 2
160]c

We assume b ≡ () if σ[µs[0] mod 2
160] = ∅.

µ′i ≡ M(µi,µs[1],µs[3])
The additions in µs[2] + i are not subject to the 2

256 modulo.
A′a ≡ Aa ∪{µs[0] mod 2

160}


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 34

0x3d RETURNDATASIZE 0 1 Get size of output data from the previous call from the current
environment.
µ′s[0] ≡‖µo‖

0x3e RETURNDATACOPY 3 0 Copy output data from the previous call to memory.

∀i ∈{0 . . .µs[2] − 1} : µ
′
m[µs[0] + i] ≡

{
µo[µs[1] + i] if µs[1] + i < ‖µo‖
0 otherwise

The additions in µs[1] + i are not subject to the 2
256 modulo.

µ′i ≡ M(µi,µs[0],µs[2])

0x3f EXTCODEHASH 1 1 Get hash of an account’s code.

µ′s[0] ≡

{
0 if DEAD(σ,µs[0] mod 2

160)

σ[µs[0] mod 2
160]c otherwise

A′a ≡ Aa ∪{µs[0] mod 2
160}

40s: Block Information

Value Mnemonic δ α Description

0x40 BLOCKHASH 1 1 Get the hash of one of the 256 most recent complete blocks.
µ′s[0] ≡ P(IHp,µs[0], 0)
where P is the hash of a block of a particular number, up to a maximum
age. 0 is left on the stack if the looked for block number is greater than or
equal to the current block number or more than 256 blocks behind the current block.

P(h,n,a) ≡




0 if n > Hi ∨a = 256 ∨h = 0
h if n = Hi

P(Hp,n,a + 1) otherwise

and we assert the header H can be determined from its hash h unless h is zero
(as is the case for the parent hash of the genesis block).

0x41 COINBASE 0 1 Get the current block’s beneficiary address.
µ′s[0] ≡ IHc

0x42 TIMESTAMP 0 1 Get the current block’s timestamp.
µ′s[0] ≡ IHs

0x43 NUMBER 0 1 Get the current block’s number.
µ′s[0] ≡ IHi

0x44 PREVRANDAO 0 1 Get the latest RANDAO mix of the post beacon state of the previous block.
µ′s[0] ≡ IHa

0x45 GASLIMIT 0 1 Get the current block’s gas limit.
µ′s[0] ≡ IHl

0x46 CHAINID 0 1 Get the chain ID.
µ′s[0] ≡ β

0x47 SELFBALANCE 0 1 Get balance of currently executing account.
µ′s[0] ≡ σ[Ia]b

0x48 BASEFEE 0 1 Get the current block’s base fee.
µ′s[0] ≡ IHf


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 35

50s: Stack, Memory, Storage and Flow Operations

Value Mnemonic δ α Description

0x50 POP 1 0 Remove item from stack.

0x51 MLOAD 1 1 Load word from memory.
µ′s[0] ≡ µm[µs[0] . . . (µs[0] + 31)]
µ′i ≡ max(µi,d(µs[0] + 32) ÷ 32e)
The addition in the calculation of µ′i is not subject to the 2

256 modulo.

0x52 MSTORE 2 0 Save word to memory.
µ′m[µs[0] . . . (µs[0] + 31)] ≡ µs[1]
µ′i ≡ max(µi,d(µs[0] + 32) ÷ 32e)
The addition in the calculation of µ′i is not subject to the 2

256 modulo.

0x53 MSTORE8 2 0 Save byte to memory.
µ′m[µs[0]] ≡ (µs[1] mod 256)
µ′i ≡ max(µi,d(µs[0] + 1) ÷ 32e)
The addition in the calculation of µ′i is not subject to the 2

256 modulo.

0x54 SLOAD 1 1 Load word from storage.
µ′s[0] ≡ σ[Ia]s[µs[0]]
A′K ≡ AK ∪{(Ia,µs[0])}

CSLOAD(µ,A,I) ≡

{
Gwarmaccess if (Ia,µs[0]) ∈ AK
Gcoldsload otherwise

0x55 SSTORE 2 0 Save word to storage.
σ′[Ia]s[µs[0]] ≡ µs[1]
A′K ≡ AK ∪{(Ia,µs[0])}
CSSTORE(σ,µ) and A

′
r are specified by EIP-2200 as follows.

We remind the reader that the checkpoint (“original”) state σ0 is the state
if the current transaction were to revert.
Let v0 = σ0[Ia]s[µs[0]] be the original value of the storage slot.
Let v = σ[Ia]s[µs[0]] be the current value.
Let v′ = µs[1] be the new value.
Then:

CSSTORE(σ,µ,A,I) ≡

{
0 if (Ia,µs[0]) ∈ AK
Gcoldsload otherwise

+



Gwarmaccess if v = v

′ ∨ v0 6= v
Gsset if v 6= v′ ∧ v0 = v ∧ v0 = 0
Gsreset if v 6= v′ ∧ v0 = v ∧ v0 6= 0

A′r ≡ Ar +



Rsclear if v 6= v′ ∧ v0 = v ∧ v′ = 0
rdirtyclear + rdirtyreset if v 6= v′ ∧ v0 6= v
0 otherwise

where

rdirtyclear ≡



−Rsclear if v0 6= 0 ∧ v = 0
Rsclear if v0 6= 0 ∧ v′ = 0
0 otherwise

rdirtyreset ≡



Gsset −Gwarmaccess if v0 = v′ ∧ v0 = 0
Gsreset −Gwarmaccess if v0 = v′ ∧ v0 6= 0
0 otherwise

0x56 JUMP 1 0 Alter the program counter.
JJUMP(µ) ≡ µs[0]
This has the effect of writing said value to µpc. See section 9.

0x57 JUMPI 2 0 Conditionally alter the program counter.

JJUMPI(µ) ≡

{
µs[0] if µs[1] 6= 0
µpc + 1 otherwise

This has the effect of writing said value to µpc. See section 9.

0x58 PC 0 1 Get the value of the program counter prior to the increment
corresponding to this instruction.
µ′s[0] ≡ µpc


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 36

0x59 MSIZE 0 1 Get the size of active memory in bytes.
µ′s[0] ≡ 32µi

0x5a GAS 0 1 Get the amount of available gas, including the corresponding reduction
for the cost of this instruction.
µ′s[0] ≡ µg

0x5b JUMPDEST 0 0 Mark a valid destination for jumps.
This operation has no effect on machine state during execution.

5f, 60s & 70s: Push Operations

Value Mnemonic δ α Description

0x5f PUSH0 0 1 Place 0 on the stack.
µ′s[0] ≡ 0

0x60 PUSH1 0 1 Place 1 byte item on stack.
µ′s[0] ≡ c(µpc + 1)

where c(x) ≡

{
Ib[x] if x < ‖Ib‖
0 otherwise

The bytes are read in line from the program code’s bytes array.
The function c ensures the bytes default to zero if they extend past the limits.
The byte is right-aligned (takes the lowest significant place in big endian).

0x61 PUSH2 0 1 Place 2-byte item on stack.
µ′s[0] ≡ c

(
(µpc + 1) . . . (µpc + 2)

)
with c(x) ≡ (c(x0), ...,c(x‖x‖−1)) with c as defined as above.
The bytes are right-aligned (takes the lowest significant place in big endian).

...
...

...
...

...

0x7f PUSH32 0 1 Place 32-byte (full word) item on stack.
µ′s[0] ≡ c

(
(µpc + 1) . . . (µpc + 32)

)
where c is defined as above.
The bytes are right-aligned (takes the lowest significant place in big endian).

80s: Duplication Operations

Value Mnemonic δ α Description

0x80 DUP1 1 2 Duplicate 1st stack item.
µ′s[0] ≡ µs[0]

0x81 DUP2 2 3 Duplicate 2nd stack item.
µ′s[0] ≡ µs[1]

...
...

...
...

...

0x8f DUP16 16 17 Duplicate 16th stack item.
µ′s[0] ≡ µs[15]

90s: Exchange Operations

Value Mnemonic δ α Description

0x90 SWAP1 2 2 Exchange 1st and 2nd stack items.
µ′s[0] ≡ µs[1]
µ′s[1] ≡ µs[0]

0x91 SWAP2 3 3 Exchange 1st and 3rd stack items.
µ′s[0] ≡ µs[2]
µ′s[2] ≡ µs[0]

...
...

...
...

...

0x9f SWAP16 17 17 Exchange 1st and 17th stack items.
µ′s[0] ≡ µs[16]
µ′s[16] ≡ µs[0]


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 37

a0s: Logging Operations

For all logging operations, the state change is to append an additional log entry on to the substate’s log series:
A′l ≡ Al · (Ia, t,µm[µs[0] . . . (µs[0] + µs[1] − 1)])
and to update the memory consumption counter:
µ′i ≡ M(µi,µs[0],µs[1])
The entry’s topic series, t, differs accordingly:

Value Mnemonic δ α Description

0xa0 LOG0 2 0 Append log record with no topics.
t ≡ ()

0xa1 LOG1 3 0 Append log record with one topic.
t ≡ (µs[2])

...
...

...
...

...

0xa4 LOG4 6 0 Append log record with four topics.
t ≡ (µs[2],µs[3],µs[4],µs[5])


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 38

f0s: System operations

Value Mnemonic δ α Description

0xf0 CREATE 3 1 Create a new account with associated code.
i ≡ µm[µs[1] . . . (µs[1] + µs[2] − 1)]
ζ ≡ ∅

(σ′,g′,A′,z, o) ≡




Λ(σ∗,A,Ia,Io,L(µg),Ip,µs[0], i,Ie + 1,ζ,Iw) if µs[0] 6 σ[Ia]b ∧
Ie < 1024 ∧‖i‖6 49152(

σ,L(µg),A, 0, ()
)

otherwise

σ∗ ≡ σ except σ∗[Ia]n = σ[Ia]n + 1
µ′g ≡ µg −L(µg) + g

′

µ′s[0] ≡ x
where x = 0 if z = 0, i.e., the contract creation process failed,
or ‖i‖ > 49152 (initcode is longer than 49152 bytes),
or Ie = 1024 (the maximum call depth limit is reached),
or µs[0] > σ[Ia]b (balance of the caller is too low to fulfil the value transfer);
and otherwise x = ADDR(Ia,σ[Ia]n,ζ, i), the address of the newly created account (95).
µ′i ≡ M(µi,µs[1],µs[2])

µ′o ≡

{
() if z = 1

o otherwise

Thus the operand order is: value, input offset, input size.

0xf1 CALL 7 1 Message-call into an account.
i ≡ µm[µs[3] . . . (µs[3] + µs[4] − 1)]

(σ′,g′,A′,x, o) ≡




Θ(σ,A∗,Ia,Io, t, t,CCALLGAS(σ,µ,A),

Ip,µs[2],µs[2], i,Ie + 1,Iw)

if µs[2] 6 σ[Ia]b ∧
Ie < 1024

(σ,CCALLGAS(σ,µ,A),A
∗, 0, ()) otherwise

n ≡ min({µs[6],‖o‖})
µ′m[µs[5] . . . (µs[5] + n− 1)] = o[0 . . . (n− 1)]
µ′o = o
µ′g ≡ µg −CCALLGAS(σ,µ,A) + g

′

µ′s[0] ≡ x
A∗ ≡ A except A∗a ≡ Aa ∪{t}
t ≡ µs[1] mod 2

160

µ′i ≡ M(M(µi,µs[3],µs[4]),µs[5],µs[6])
where x = 0 if the code execution for this operation failed, or if
µs[2] > σ[Ia]b (not enough funds) or Ie = 1024 (call depth limit reached); x = 1
otherwise.
Thus the operand order is: gas, to, value, in offset, in size, out offset, out size.
CCALL(σ,µ,A) ≡ CGASCAP(σ,µ,A) + CEXTRA(σ,µ,A)

CCALLGAS(σ,µ,A) ≡

{
CGASCAP(σ,µ,A) + Gcallstipend if µs[2] 6= 0
CGASCAP(σ,µ,A) otherwise

CGASCAP(σ,µ,A) ≡

{
min{L(µg −CEXTRA(σ,µ,A)),µs[0]} if µg ≥ CEXTRA(σ,µ,A)
µs[0] otherwise

CEXTRA(σ,µ,A) ≡ Caaccess(t,A) + CXFER(µ) + CNEW(σ,µ)

CXFER(µ) ≡

{
Gcallvalue if µs[2] 6= 0
0 otherwise

CNEW(σ,µ) ≡

{
Gnewaccount if DEAD(σ, t) ∧µs[2] 6= 0
0 otherwise

0xf2 CALLCODE 7 1 Message-call into this account with an alternative account’s code.
Exactly equivalent to CALL except:

(σ′,g′,A′,x, o) ≡




Θ(σ,A∗,Ia,Io,Ia, t,CCALLGAS(σ,µ,A),

Ip,µs[2],µs[2], i,Ie + 1,Iw)

if µs[2] 6 σ[Ia]b ∧
Ie < 1024

(σ,CCALLGAS(σ,µ,A),A
∗, 0, ()) otherwise

Note the change in the fourth parameter to the call Θ from the 2nd stack value
µs[1] (as in CALL) to the present address Ia. This means that the recipient is in
fact the same account as at present, simply that the code is overwritten.

0xf3 RETURN 2 0 Halt execution returning output data.
HRETURN(µ) ≡ µm[µs[0] . . . (µs[0] + µs[1] − 1)]
This has the effect of halting the execution at this point with output defined.
See section 9.
µ′i ≡ M(µi,µs[0],µs[1])


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 39

0xf4 DELEGATECALL 6 1 Message-call into this account with an alternative account’s code, but
persisting the current values for sender and value.
Compared with CALL, DELEGATECALL takes one fewer arguments. The
omitted argument is µs[2]. As a result, µs[3], µs[4], µs[5] and µs[6] in the
definition of CALL should respectively be replaced with µs[2], µs[3], µs[4] and
µs[5]. Otherwise it is equivalent to CALL except:

(σ′,g′,A′,x, o) ≡




Θ(σ,A∗,Is,Io,Ia, t,CCALLGAS(σ,µ,A),

Ip, 0,Iv, i,Ie + 1,Iw)
if Ie < 1024

(σ,CCALLGAS(σ,µ,A),A
∗, 0, ()) otherwise

Note the changes (in addition to that of the fourth parameter) to the second
and ninth parameters to the call Θ.
This means that the recipient is in fact the same account as at present, simply
that the code is overwritten and the context is almost entirely identical.

0xf5 CREATE2 4 1 Create a new account with associated code.
Exactly equivalent to CREATE except:
The salt ζ ≡ µs[3].

0xfa STATICCALL 6 1 Static message-call into an account.
Exactly equivalent to CALL except:
The argument µs[2] is replaced with 0.
The deeper argument µs[3], µs[4], µs[5] and µs[6] are respectively replaced
with µs[2], µs[3], µs[4] and µs[5].
The last argument of Θ is ⊥.

0xfd REVERT 2 0 Halt execution reverting state changes but returning data and remaining gas.
HRETURN(µ) ≡ µm[µs[0] . . . (µs[0] + µs[1] − 1)]
The effect of this operation is described in (152).
For the gas calculation, we use the memory expansion function,
µ′i ≡ M(µi,µs[0],µs[1])

0xfe INVALID ∅ ∅ Designated invalid instruction.

0xff SELFDESTRUCT 1 0 Halt execution and register account for later deletion. Readers should be warned,
since the Shanghai update, this opcode has been marked as deprecated.
Its behavior might be subject to breaking change in an upcoming update.
A′s ≡ As ∪{Ia}
A′a ≡ Aa ∪{r}

σ′[r] ≡



∅ if σ[r] = ∅ ∧ σ[Ia]b = 0
(σ[r]n,σ[r]b + σ[Ia]b,σ[r]s,σ[r]c) if r 6= Ia
(σ[r]n, 0,σ[r]s,σ[r]c) otherwise

where r = µs[0] mod 2
160

σ′[Ia]b = 0

CSELFDESTRUCT(σ,µ) ≡ Gselfdestruct +

{
0 if r ∈ Aa
Gcoldaccountaccess otherwise

+

{
Gnewaccount if DEAD(σ,r) ∧σ[Ia]b 6= 0
0 otherwise

Appendix I. Genesis Block

The genesis block is 15 items, and is specified thus:

(333)
((

0256,KEC
(
RLP
(
()
))
, 0160,stateRoot, 0, 0, 02048, 2

34
, 0, 0, 3141592, time, 0, 0256,KEC

(
(42)

))
, (), ()

)
Where 0256 refers to the parent hash, a 256-bit hash which is all zeroes; 0160 refers to the beneficiary address, a 160-bit

hash which is all zeroes; 02048 refers to the log bloom, 2048-bit of all zeros; 2
34 refers to the difficulty; the transaction trie

root, receipt trie root, gas used, block number and extradata are both 0, being equivalent to the empty byte array. The
sequences of both ommers and transactions are empty and represented by (). KEC

(
(42)

)
refers to the Keccak-256 hash of

a byte array of length one whose first and only byte is of value 42, used for the nonce. KEC
(
RLP
(
()
))

value refers to the
hash of the ommer list in RLP, both empty lists.

The proof-of-concept series include a development premine, making the state root hash some value stateRoot. Also
time will be set to the initial timestamp of the genesis block. The latest documentation should be consulted for those
values.


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 40

Appendix J. Ethash

J.1. Deprecation. This section is kept for historical purposes, but the Ethash algorithm is no longer used for consensus
as of the Paris hard fork.

J.2. Definitions. We employ the following definitions:

Name Value Description

Jwordbytes 4 Bytes in word.
Jdatasetinit 2

30 Bytes in dataset at genesis.
Jdatasetgrowth 2

23 Dataset growth per epoch.
Jcacheinit 2

24 Bytes in cache at genesis.
Jcachegrowth 2

17 Cache growth per epoch.
Jepoch 30000 Blocks per epoch.
Jmixbytes 128 mix length in bytes.
Jhashbytes 64 Hash length in bytes.
Jparents 256 Number of parents of each dataset element.
Jcacherounds 3 Number of rounds in cache production.
Jaccesses 64 Number of accesses in hashimoto loop.

J.3. Size of dataset and cache. The size for Ethash’s cache c ∈ B and dataset d ∈ B depend on the epoch, which in
turn depends on the block number.

(334) Eepoch(Hi) =

⌊
Hi

Jepoch

⌋
The size of the dataset growth by Jdatasetgrowth bytes, and the size of the cache by Jcachegrowth bytes, every epoch. In
order to avoid regularity leading to cyclic behavior, the size must be a prime number. Therefore the size is reduced by a
multiple of Jmixbytes, for the dataset, and Jhashbytes for the cache. Let dsize = ‖d‖ be the size of the dataset. Which is
calculated using

(335) dsize = Eprime(Jdatasetinit + Jdatasetgrowth ·Eepoch −Jmixbytes,Jmixbytes)

The size of the cache, csize, is calculated using

(336) csize = Eprime(Jcacheinit + Jcachegrowth ·Eepoch −Jhashbytes,Jhashbytes)

(337) Eprime(x,y) =

{
x if x/y ∈ N
Eprime(x− 2 ·y,y) otherwise

J.4. Dataset generation. In order to generate the dataset we need the cache c, which is an array of bytes. It depends
on the cache size csize and the seed hash s ∈ B32.

J.4.1. Seed hash. The seed hash is different for every epoch. For the first epoch it is the Keccak-256 hash of a series of 32
bytes of zeros. For every other epoch it is always the Keccak-256 hash of the previous seed hash:

(338) s = Cseedhash(Hi)

(339) Cseedhash(Hi) =

{
032 if Eepoch(Hi) = 0

KEC(Cseedhash(Hi −Jepoch)) otherwise

With 032 being 32 bytes of zeros.

J.4.2. Cache. The cache production process involves using the seed hash to first sequentially filling up csize bytes of
memory, then performing Jcacherounds passes of the RandMemoHash algorithm created by Lerner [2014]. The initial cache
c′, being an array of arrays of single bytes, will be constructed as follows.

We define the array ci, consisting of 64 single bytes, as the ith element of the initial cache:

(340) ci =

{
KEC512(s) if i = 0

KEC512(ci−1) otherwise

Therefore c′ can be defined as

(341) c
′
[i] = ci ∀ i < n

(342) n =

⌊
csize

Jhashbytes

⌋
The cache is calculated by performing Jcacherounds rounds of the RandMemoHash algorithm to the initial cache c

′:

(343) c = Ecacherounds(c
′
,Jcacherounds)


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 41

(344) Ecacherounds(x,y) =




x if y = 0

ERMH(x) if y = 1

Ecacherounds(ERMH(x),y − 1) otherwise

Where a single round modifies each subset of the cache as follows:

(345) ERMH(x) =
(
Ermh(x, 0),Ermh(x, 1), ...,Ermh(x,n− 1)

)
(346) Ermh(x, i) = KEC512(x

′
[(i− 1 + n) mod n] ⊕ x′[x′[i][0] mod n])

with x
′

= x except x
′
[j] = Ermh(x,j) ∀ j < i

J.4.3. Full dataset calculation. Essentially, we combine data from Jparents pseudorandomly selected cache nodes, and hash
that to compute the dataset. The entire dataset is then generated by a number of items, each Jhashbytes bytes in size:

(347) d[i] = Edatasetitem(c, i) ∀ i <
⌊

dsize
Jhashbytes

⌋
In order to calculate the single item we use an algorithm inspired by the FNV hash (Glenn Fowler [1991]) in some cases
as a non-associative substitute for XOR.

(348) EFNV(x, y) = (x · (0x01000193 ⊕ y)) mod 232

The single item of the dataset can now be calculated as:

(349) Edatasetitem(c, i) = Eparents(c, i,−1,∅)

(350) Eparents(c, i,p, m) =

{
Eparents(c, i,p + 1,Emix(m, c, i,p + 1)) if p < Jparents − 2
Emix(m, c, i,p + 1) otherwise

(351) Emix(m, c, i,p) =

{
KEC512(c[i mod csize] ⊕ i) if p = 0
EFNV

(
m, c[EFNV(i⊕p, m[p mod bJhashbytes/Jwordbytesc]) mod csize]

)
otherwise

J.5. Proof-of-work function. Essentially, we maintain a “mix” Jmixbytes bytes wide, and repeatedly sequentially fetch
Jmixbytes bytes from the full dataset and use the EFNV function to combine it with the mix. Jmixbytes bytes of sequential
access are used so that each round of the algorithm always fetches a full page from RAM, minimizing translation lookaside
buffer misses which ASICs would theoretically be able to avoid.

If the output of this algorithm is below the desired target, then the nonce is valid. Note that the extra application
of KEC at the end ensures that there exists an intermediate nonce which can be provided to prove that at least a small
amount of work was done; this quick outer PoW verification can be used for anti-DDoS purposes. It also serves to provide
statistical assurance that the result is an unbiased, 256-bit number.

The PoW-function returns an array with the compressed mix as its first item and the Keccak-256 hash of the
concatenation of the compressed mix with the seed hash as the second item:
(352)
PoW(Hn,Hn, d) = {mc(KEC(RLP(LH(Hn))),Hn, d),KEC(sh(KEC(RLP(LH(Hn))),Hn) + mc(KEC(RLP(LH(Hn))),Hn, d))}

With Hn being the hash of the header without the nonce. The compressed mix mc is obtained as follows:

(353) mc(h, n, d) = Ecompress(Eaccesses(d,

nmix∑
i=0

sh(h, n), sh(h, n),−1),−4)

The seed hash being:

(354) sh(h, n) = KEC512(h + Erevert(n))

Erevert(n) returns the reverted bytes sequence of the nonce n:

(355) Erevert(n)[i] = n[‖n‖− i]

We note that the “+”-operator between two byte sequences results in the concatenation of both sequences.
The dataset d is obtained as described in section J.4.3.
The number of replicated sequences in the mix is:

(356) nmix =

⌊
Jmixbytes
Jhashbytes

⌋
In order to add random dataset nodes to the mix, the Eaccesses function is used:

(357) Eaccesses(d, m, s, i) =

{
Emixdataset(d, m, s, i) if i = Jaccesses − 2
Eaccesses(Emixdataset(d, m, s, i), s, i + 1) otherwise

(358) Emixdataset(d, m, s, i) = EFNV(m,Enewdata(d, m, s, i))


ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER SHANGHAI VERSION 42

Enewdata returns an array with nmix elements:

(359) Enewdata(d, m, s, i)[j] = d[EFNV(i⊕ s[0], m[i mod
⌊
Jmixbytes
Jwordbytes

⌋
]) mod

⌊
dsize/Jhashbytes

nmix

⌋
·nmix + j] ∀ j < nmix

The mix is compressed as follows:

(360) Ecompress(m, i) =

{
m if i > ‖m‖− 8
Ecompress(EFNV(EFNV(EFNV(m[i + 4], m[i + 5]), m[i + 6]), m[i + 7]), i + 8) otherwise

Appendix K. Anomalies on the Main Network

K.1. Deletion of an Account Despite Out-of-gas. At block 2675119, in the transaction 0xcf416c536ec1a19ed1fb89e
4ec7ffb3cf73aa413b3aa9b77d60e4fd81a4296ba, an account at address 0x03 was called and an out-of-gas occurred during
the call. Against the equation (209), this added 0x03 in the set of touched addresses, and this transaction turned σ[0x03]
into ∅.

Appendix L. List of mathematical symbols

Symbol Latex Command Description∨
\bigvee This is the least upper bound, supremum, or join of all elements operated on. Thus it

is the greatest element of such elements (Davey and Priestley [2002]).


	1. Introduction
	1.1. Driving Factors
	1.2. Previous Work

	2. The Blockchain Paradigm
	2.1. Value
	2.2. Which History?

	3. Conventions
	4. Blocks, State and Transactions
	4.1. World State
	4.2. The Transaction
	4.3. The Withdrawal
	4.4. The Block

	5. Gas and Payment
	6. Transaction Execution
	6.1. Substate
	6.2. Execution

	7. Contract Creation
	7.1. Subtleties

	8. Message Call
	9. Execution Model
	9.1. Basics
	9.2. Fees Overview
	9.3. Execution Environment
	9.4. Execution Overview
	9.5. The Execution Cycle

	10. Transition to Proof of Stake
	10.1. Post-Paris Updates

	11. Blocktree to Blockchain
	12. Block Finalisation
	12.1. Executing Withdrawals
	12.2. Transaction Validation
	12.3. State Validation

	13. Implementing Contracts
	13.1. Data Feeds
	13.2. Random Numbers

	14. Future Directions
	15. Conclusion
	16. Acknowledgements
	17. Availability
	References
	Appendix A. Terminology
	Appendix B. Recursive Length Prefix
	Appendix C. Hex-Prefix Encoding
	Appendix D. Modified Merkle Patricia Tree
	D.1. Trie Database

	Appendix E. Precompiled Contracts
	E.1. zkSNARK Related Precompiled Contracts
	E.2. BLAKE2 Precompiled Contract

	Appendix F. Signing Transactions
	Appendix G. Fee Schedule
	Appendix H. Virtual Machine Specification
	H.1. Gas Cost
	H.2. Instruction Set

	Appendix I. Genesis Block
	Appendix J. Ethash
	J.1. Deprecation
	J.2. Definitions
	J.3. Size of dataset and cache
	J.4. Dataset generation
	J.5. Proof-of-work function

	Appendix K. Anomalies on the Main Network
	K.1. Deletion of an Account Despite Out-of-gas

	Appendix L. List of mathematical symbols