key: cord-206872-t6lr3g1m authors: Huang, Huawei; Kong, Wei; Zhou, Sicong; Zheng, Zibin; Guo, Song title: A Survey of State-of-the-Art on Blockchains: Theories, Modelings, and Tools date: 2020-07-07 journal: nan DOI: nan sha: doc_id: 206872 cord_uid: t6lr3g1m To draw a roadmap of current research activities of the blockchain community, we first conduct a brief overview of state-of-the-art blockchain surveys published in the recent 5 years. We found that those surveys are basically studying the blockchain-based applications, such as blockchain-assisted Internet of Things (IoT), business applications, security-enabled solutions, and many other applications in diverse fields. However, we think that a comprehensive survey towards the essentials of blockchains by exploiting the state-of-the-art theoretical modelings, analytic models, and useful experiment tools is still missing. To fill this gap, we perform a thorough survey by identifying and classifying the most recent high-quality research outputs that are closely related to the theoretical findings and essential mechanisms of blockchain systems and networks. Several promising open issues are also summarized finally for future research directions. We wish this survey can serve as a useful guideline for researchers, engineers, and educators about the cutting-edge development of blockchains in the perspectives of theories, modelings, and tools. Blockchains have been deeply diving into multiple applications that are closely related to every aspect of our daily life, such as cryptocurrencies, business applications, smart city, Internet-of-Things (IoT) applications, and etc. In the following, before discussing the motivation of this survey, we first conduct a brief exposition of the state-of-the-art blockchain survey articles published in the recent few years. To identify the position of our survey, we first collect 66 state-of-the-art blockchain-related survey articles. The numbers of each category of those surveys are shown in Fig. 1 . We see that the top-three popular topics of blockchain-related survey are IoT & IIoT, Consensus Protocols, and Security & privacy. We also classify those existing surveys and their chronological distribution in Fig. 2 , from which we discover that i) the number of surveys published in each year increases dramatically, and ii) the diversity of topics also becomes greater following the chronological order. In detail, we summarize the publication years, topics, and other metadata of these surveys in Table 1 and Table 2 . Basically, those surveys can be classified into the following 7 groups, which are briefly reviewed as follows. 1.1.1 Blockchain Essentials. The first group is related to the essentials of the blockchain. A large number of consensus protocols, algorithms, and mechanisms have been reviewed and summarized in [1] [2] [3] [4] [5] [6] [7] [8] . For example, motivated by lack of a comprehensive literature review regarding the consensus protocols for blockchain networks, Wang et al. [3] emphasized on both the system design and the incentive mechanism behind those distributed blockchain consensus protocols such as Byzantine Fault Tolerant (BFT)-based protocols and Nakamoto protocols. From a game-theoretic viewpoint, the authors also studied how such consensus protocols affect the consensus participants in blockchain networks. During the surveys of smart contracts [9] [10] [11] , Atzei et al. [9] paid their attention to the security vulnerabilities and programming pitfalls that could be incurred in Ethereum smart contracts. Dwivedi et al. [10] performed a systematic taxonomy on smart-contract languages, while Zheng et al. [11] conducted a survey on the challenges, recent technical advances and typical platforms of smart contracts. Sharding techniques are viewed as promising solutions to solving the scalability issue and low-performance problems of blockchains. Several survey articles [12, 13] provide systematic reviews on sharding-based blockchain techniques. For example, Wang et al. [12] focused on the general design flow and critical design challenges of sharding protocols. Next, Yu et al. [13] mainly discussed the intra-consensus security, atomicity of cross-shard transactions, and other advantages of sharding mechanisms. Regarding scalability, Chen et al. [14] analyzed the scalability technologies in terms of efficiency-improving and function-extension of blockchains, while Zhou et al. [15] compared and classified the existing scalability solutions in Manuscript submitted to ACM roles for the performance, security, healthy conditions of blockchain systems and blockchain networks. For example, Salah et al. [26] studied how blockchain technologies benefit key problems of AI. Zheng et al. [27] proposed the concept of blockchain intelligence and pointed out the opportunities that both these two terms can benefit each other. Next, Chen et al. [28] discussed the privacy-preserving and secure design of machine learning when blockchain techniques are imported. Liu et al. [29] identified the overview, opportunities, and applications when integrating blockchains and machine learning technologies in the context of communications and networking. Recently, game theoretical solutions [30] have been reviewed when they are applied in blockchain security issues such as malicious attacks and selfish mining, as well as the resource allocation in the management of mining. Both the advantages and disadvantages of game theoretical solutions and models were discussed. Networking. First, Park et al. [31] discussed how to take the advantages of blockchains in could computing with respect to security solutions. Xiong et al. [32] then investigated how to facilitate blockchain applications in mobile IoT and edge computing environments. Yang et al. [33] identified various perspectives including motivations, frameworks, and functionalities when integrating blockchain with edge computing. Nguyen et al. [34] presented a comprehensive survey when blockchain meets 5G networks and beyond. The authors focused on the opportunities that blockchain can bring for 5G technologies, which include cloud computing, mobile edge computing, SDN/NFV, network slicing, D2D communications, 5G services, and 5G IoT applications. Manuscript submitted to ACM Table 2 . Taxonomy of existing blockchain-related surveys (Part 2). Category Ref. Year Topic IoT, IIoT Christidis [35] 2016 Blockchains and Smart Contracts for IoT Ali [36] 2018 Applications of blockchains in IoT Fernandez [37] 2018 Usage of Blockchain for IoT Kouicem [38] 2018 IoT security Panarello [39] 2018 Integration of Blockchain and IoT Dai [40] 2019 Blockchain for IoT Wang [41] 2019 Blockchain for IoT Nguyen [42] 2019 Integration of Blockchain and Cloud of Things Restuccia [43] 2019 Blockchain technology for IoT Cao [44] 2019 Challenges in distributed consensus of IoT Park [45] 2020 Blockchain Technology for Green IoT Lao [46] 2020 IoT Applications in Blockchain Systems Alladi [47] 2019 Blockchain Applications in Industry 4.0 and IIoT Zhang [48] 2019 5G Beyond for IIoT based on Edge Intelligence and Blockchain UAV Alladi [49] 2020 Blockchain-based UAV applications Group-6: Lu [50] 2018 Functions, applications and open issues of Blockchain Casino [51] 2019 Current status, classification and open issues of Blockchain Apps Agriculture Bermeo [52] 2018 Blockchain technology in agriculture Ferrag [53] 2020 Blockchain solutions to Security and Privacy for Green Agriculture SDN Alharbi [54] 2020 Deployment of Blockchains for Software Defined Networks Business Apps Konst. [55] 2018 Blockchain-based business applications Smart City Xie [56] 2019 Blockchain technology applied in smart cities Smart Grids Alladi [57] 2019 Blockchain in Use Cases of Smart Grids Aderibole [58] 2020 Smart Grids based on Blockchain Technology File Systems Huang [59] 2020 Blockchain-based Distributed File Systems, IPFS, Filecoin, etc. Space Industry Torky [60] 2020 Blockchain in Space Industry COVID- 19 Nguyen [61] 2020 Combat COVID-19 using Blockchain and AI-based Solutions Yuan [62] 2016 The state of the art and future trends of Blockchain General & Outlook Zheng [63] 2017 Architecture, Consensus, and Future Trends of Blockchains Zheng [64] 2018 Challenges and opportunities of Blockchain Yuan [65] 2018 Blockchain and cryptocurrencies Kolb [66] 2020 Core Concepts, Challenges, and Future Directions in Blockchains 1.1.5 IoT & IIoT. The blockchain-based applications for Internet of Things (IoT) [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] and Industrial Internet of Things (IIoT) [47, 48] have received the largest amount of attention from both academia and industry. For example, as a pioneer work in this category, Christidis et al. [35] provided a survey about how blockchains and smart contracts promote the IoT applications. Later on, Nguyen et al. [42] presented an investigation of the integration between blockchain technologies and cloud of things with in-depth discussion on backgrounds, motivations, concepts and architectures. Recently, Park et al. [45] emphasized on the topic of introducing blockchain technologies to the sustainable ecosystem of green IoT. For the IIoT, Zhang et al. [48] discussed the integration of blockchain and edge intelligence to empower a secure IIoT framework in the context of 5G and beyond. In addition, when applying blockchains to the unmanned aerial vehicles (UAV), Alladi et al. [49] reviewed numerous application scenarios covering both commercial and military domains such as network security, surveillance, etc. The research areas covered by the existing surveys on the blockchain-based applications include general applications [50, 51] , agriculture [52, 53] , Software-defined Networking (SDN) [54] , business applications [55] , smart city [56] , smart grids [57, 58] , distributed file systems [59] , space industry [60] , and COVID-19 [61] . Some of those surveys are reviewed as follows. Lu et al. [50] performed a literature review on the fundamental features of blockchain-enabled applications. Through the review, the authors expect to outlook the development routine of blockchain technologies. Then, Casino et al. [51] presented a systematic survey of blockchain-enabled applications in the context of multiple sectors and industries. Both the current status and the prospective characteristics of blockchain technologies were identified. In more specific Summary of Survey-Article Review: Through the brief review of the state-of-the-art surveys, we have found that the blockchain technologies have been adaptively integrated into a growing range of application sectors. The blockchain theory and technology will bring substantial innovations, incentives, and a great number of application scenarios in diverse fields. Based on the analysis of those survey articles, we believe that there will be more survey articles published in the near future, very likely in the areas of sharding techniques, scalability, interoperability, smart contracts, big data, AI technologies, 5G and Beyond, edge computing, cloud computing, and many other fields. Via the overview, shown in Table 1 , Table 2 , Fig. 1 and Fig. 2 In a summary, by this article, we would like to fill the gap by emphasizing on the cutting-edge theoretical studies, modelings, and useful tools for blockchains. Particularly, we try to include the latest high-quality research outputs that have not been included by other existing survey articles. We believe that this survey can shed new light on the further development of blockchains. Our survey presented in this article includes the following contributions. • We conduct a brief classification of existing blockchain surveys to highlight the meaning of our literature review shown in this survey. • We then present a comprehensive investigation on the state-of-the-art theoretical modelings, analytics models, performance measurements, and useful experiment tools for blockchains, blockchain networks, and blockchain systems. • Several promising directions and open issues for future studies are also envisioned finally. The structure of this survey is shown in Fig. 3 and organized as follows. Section 2 introduces the preliminaries of blockchains. Section 3 summarizes the state-of-the-art theoretical studies that improve the performance of blockchains. In Section 4, we then review various modelings and analytic models that help understand blockchains. Diverse measurement approaches, datasets, and useful tools for blockchains are overviewed in Section 5. We outlook the open issues in Section 6. Finally, Section 7 concludes this article. Blockchain is a promising paradigm for content distribution and distributed consensus over P2P networks. In this section, we present the basic concepts, definitions and terminologies of blockchains appeared in this article. Manuscript submitted to ACM 2.1 Prime Blockchain Platforms 2.1.1 Bitcoin. Bitcoin is viewed as the blockchain system that executes the first cryptocurrency on this world. It builds upon two major techniques, i.e., Nakamoto Consensus and UTXO Model, which are introduced as follows. Nakamoto Consensus. To achieve an agreement of blocks, Bitcoin adopts the Nakamoto Consensus, in which miners generate new blocks by solving a puzzle. In such a puzzle-solving process, also referred to as mining, miners need to calculate a nonce value that fits the required difficulty level [67] . Through changing the difficulty, Bitcoin system can maintain a stable rate of block-generation, which is about one block per 10 minutes. When a miner generates a new block, it broadcasts this message to all the other miners in the network. If others receive this new block, they add this block to their local chain. If all of the other miners receive this new block timely, the length of the main chain increases by one. However, because of the network delays, not always all the other miners can receive a new block in time. When a miner generates a block before it receives the previous one, a fork yields. Bitcoin addresses this issue by following the rule of longest chain. UTXO Model. The Unspent Transaction Output (UTXO) model is adopted by cryptocurrencies like Bitcoin, and other popular blockchain systems [68, 69] . A UTXO is a set of digital money, each represents a chain of ownership between the owners and the receivers based on the cryptography technologies. In a blockchain, the overall UTXOs form a set, in which each element denotes the unspent output of a transaction, and can be used as an input for a future transaction. A client may own multiple UTXOs, and the total coin of this client is calculated by summing up all associated UTXOs. Using this model, blockchains can prevent the double-spend [70] attacks efficiently. [71] is an open-source blockchain platform enabling the function of smart contract. As the token in Ethereum, Ether is rewarded to the miners who conducted computation to secure the consensus of the blockchain. Ethereum executes on decentralized Ethereum Virtual Machines (EVMs), in which scripts are running on a network consisting of public Ethereum nodes. Comparing with Bitcoin, the EVM's instruction set is believed Turing-complete. Ethereum also introduces an internal pricing mechanism, called gas. A unit of gas measures the amount of computational effort needed to execute operations in a transaction. Thus, gas mechanism is useful to restrain the spam in smart contracts. Ethereum 2.0 is an upgraded version based on the original Ethereum. The upgrades include a transition from PoW to Proof-of-Stake (PoS), and a throughput-improving based on sharding technologies. EOSIO is another popular blockchain platform released by a company block.one on 2018. Different from Bitcoin and Ethereum, the smart contracts of EOSIO don't need to pay transaction fees. Its throughput is claimed to reach millions of transactions per second. Furthermore, EOSIO also enables low block-confirmatoin latency, low-overhead BFT finality, and etc. These excellent features has attracted a large-number of users and developers to quickly and easily deploy decentralized applications in a governed blockchain. For example, in total 89,800,000 EOSIO blocks have been generated in less than one and a half years since its first launching. The consensus mechanism in blockchains is for fault-tolerant to achieve an agreement on the same state of the blockchain network, such as a single state of all transactions in a cryptocurrency blockchain. Popular proof-based consensus protocols include PoW and PoS. In PoW, miners compete with each other to solve a puzzle that is difficult to produce a result but easy to verify the result by others. Once a miner yields a required nonce value through a huge number of attempts, it gets paid a certain cryptocurrencies for creating a new block. In contrast, PoS doesn't have miners. Instead, the new block is forged by validators selected randomly within a committee. The probability to be chosen as a validator is linearly related to the size of its stake. PoW and PoS are both adopted as consensus protocols for the security of cryptocurrencies. The former is based on the CPU power, and the latter on the coin age. Therefore, PoS is with lower energy-cost and less likely to be attacked by the 51% attack. Blockchain as a distributed and public database of transactions has become a platform for decentralized applications. Despite its increasing popularity, blockchain technology faces the scalability problem: throughput does not scale with the increasing network size. Thus, scalable blockchain protocols that can solve the scalability issues are still in an urgent need. Many different directions, such as Off-chain, DAG, and Sharding techniques, have been exploited to address the scalability of blockchains. Here, we present several representative terms related to scalability. Mathematically, a DAG is a finite directed graph where no directed cycles exist. In the context of blockchain, DAG is viewed as a revolutionized technology that can upgrade blockchain to a new generation. This is because DAG is blockless, and all transactions link to multiple other transactions following a topological order on a DAG network. Thus, data can move directly between network participants. This results in a faster, cheaper and more scalable solution for blockchains. In fact, the bottleneck of blockchains mainly relies on the structure of blocks. Thus, probably the blockless DAG could be a promising solution to improve the scalability of blockchains substantially. Technique. The consensus protocol of Bitcoin, i.e., Nakamoto Consensus, has significant drawbacks on the performance of transaction throughput and network scalability. To address these issues, sharding technique is one of the outstanding approaches, which improves the throughput and scalability by partitioning the blockchain network into several small shards such that each can process a bunch of unconfirmed transactions in parallel to generate medium blocks. Such medium blocks are then merged together in a final block. Basically, sharding technique includes Network Sharding, Transaction Sharding and State Sharding. One shortcoming of sharding technique is that the malicious network nodes residing in the same shard may collude with each other, resulting in security issues. Therefore, the sharding-based protocols exploits reshuffling strategy to address such security threats. However, reshuffling brings the cross-shard data migration. Thus, how to efficiently handle the cross-shard transactions becomes an emerging topic in the context of sharding blockchain. Manuscript submitted to ACM 3.1.1 Throughput & Latency. Aiming to reduce the confirmation latency of transactions to milliseconds, Hari et al. [72] proposed a high-throughput, low-latency, deterministic confirmation mechanism called ACCEL for accelerating Bitcoin's block confirmation. The key findings of this paper includes how to identify the singular blocks, and how to use singular blocks to reduce the confirmation delay. Once the confirmation delay is reduced, the throughput increases accordingly. Two obstacles have hindered the scalability of the cryptocurrency systems. The first one is the low throughput, and the other one is the requirement for every node to duplicate the communication, storage, and state representation of the entire blockchain network. Wang et al. [73] studied how to solve the above obstacles. Without weakening decentralization and security, the proposed Monoxide technique offers a linear scale-out ability by partitioning the workload. And they preserved the simplicity of the blockchain system and amplified its capacity. The authors also proposed a novel Chu-ko-nu mining mechanism, which ensures the cross-zone atomicity, efficiency and security of the However, the authors also admitted that although the proposed Prism has a high throughput, its confirming latency still maintains as large as 10 seconds since there is only a single voter chain in Prism. A promising solution is to introduce a large number of such voter chains, each of which is not necessarily secure. Even though every voter chain is under attacking with a probability as high as 30%, the successful rate of attacking a half number of all voter chains is still theoretically very low. Thus, the authors believed that using multiple voter chains would be a good solution to reducing the confirmation latency while not sacrificing system security. Considering that Ethereum simply allocates transactions to shards according to their account addresses rather than relying on the workload or the complexity of transactions, the resource consumption of transactions in each shard is unbalanced. In consequence, the network transaction throughput is affected and becomes low. To solve this problem, Woo et al. [75] proposed a heuristic algorithm named GARET, which is a gas consumption-aware relocation mechanism for improving throughput in sharding-based Ethereum environments. In particular, the proposed GARET can relocate transaction workloads of each shard according to the gas consumption. The experiment results show that GARET achieves a higher transactions throughput and a lower transaction latency compared with existing techniques. The transactions generated at real-time make the size of blockchains keep growing. For example, the storage efficiency of original-version Bitcoin has received much criticism since it requires to store the full transaction history in each Bitcoin peer. Although some revised protocols advocate that only the full-size nodes store the entire copy of whole ledger, the transactions still consume a large storage space in those full-size nodes. To alleviate this problem, several pioneer studies proposed storage-efficient solutions for blockchain networks. For example, By exploiting the erasure code-based approach, Perard et al. [76] proposed a low-storage blockchain mechanism, aiming to achieve a low requirement of storage for blockchains. The new low-storage nodes only have to store the linearly encoded fragments of each block. The original blockchain data can be easily recovered by retrieving fragments from other nodes under the erasure-code framework. Thus, this type of blockchain nodes allows blockchain clients to reduce the storage Table 3 . Latest Theories of Improving the Performance of Blockchains. Throughput [72] Reduce confirmation delay Authors proposed a high-throughput, low-latency, deterministic confirmation mechanism, aiming to accelerate Bitcoin's block confirmation. & Latency The proposed Monoxide offers a linear scale-out by partitioning workloads. Particularly, Chu-konu mining mechanism enables the cross-zone atomicity, efficiency and security of the system. [74] Prism Authors proposed a new blockchain protocol, i.e., Prism, aiming to achieve a scalable throughput with a full security of bitcoin. [75] GARET Authors proposed a gas consumption-aware relocation mechanism for improving throughput in sharding-based Ethereum. Storage [76] Erasure codebased Authors proposed a new type of low-storage blockchain nodes using erasure code theory to reduce the storage space of blockchains. Efficiency [77] Jidar: Data-Reduction Strategy Authors proposed a data reduction strategy for Bitcoin namely Jidar, in which each node only has to store the transactions of interest and the related Merkle branches from the complete blocks. [78] Segment blockchain Authors proposed a data-reduced storage mechanism named segment blockchain such that each node only has to store a segment of the blockchain. Reliability [79] Availability of blockchains Authors studied the availability for blockchain-based systems, where the read and write availabilities are conflict to each other. Analysis [80] Reliability prediction Authors proposed H-BRP to predict the reliability of blockchain peers by extracting their reliability parameters. capacity. The authors also tested their system on the low-configuration Raspberry Pi to show the effectiveness, which demonstrates the possibility towards running blockchains on IoT devices. Then, Dai et al. [77] proposed Jidar, which is a data reduction strategy for Bitcoin. In Jidar, each node only has to store the transactions of interest and the related Merkle branches from the complete blocks. All nodes verify transactions collaboratively by a query mechanism. This approach seems very promising to the storage efficiency of Bitcoin. However, their experiments show that the proposed Jidar can only reduce the storage overhead of each peer by about 1% comparing with the original Bitcoin. Under the similar idea, Xu et al. [78] reduced the storage of blockchains using a segment blockchain mechanism, in which each node only needs to store a piece of blockchain segment. The authors also proved that the proposed mechanism endures a failure probability (ϕ/n) m if an adversary party commits a collusion with less than a number ϕ of nodes and each segment is stored by a number m of nodes. This theoretical result is useful for the storage design of blockchains when developing a particular segment mechanism towards data-heavy distributed applications. In public blockchains, the system clients join the blockchain network basically through a third-party peer. Thus, the reliability of the selected blockchain peer is critical to the security of clients in terms of both resource-efficiency and monetary issues. To enable clients evaluate and choose the reliable blockchain peers, Zheng et al. [80] proposed a hybrid reliability prediction model for blockchains named H-BRP, which is able to predict the reliability of blockchain peers by extracting their reliability parameters. Manuscript submitted to ACM Sharding [81] Rapidchain Authors proposed a new sharding-based protocol for public blockchains that achieves nonlinearly increase of intra-committee communications with the number of committee memebers. blockchains [82] SharPer Authors proposed a permissioned blockchain system named SharPer, which adopts sharding techniques to improve scalability of cross-shard transactions. [83] D-GAS Authors proposed a dynamic load balancing mechanism for Ethereum shards, i.e., D-GAS. It reallocates Tx accounts by their gas consumption on each shard. [84] NRSS Authors proposed a node-rating based new Sharding scheme, i.e., NRSS, for blockchains, aiming to improve the throughput of committees. [85] OptChain Authors proposed a new sharding paradigm, called OptChain, mainly used for optimizing the placement of transactions into shards. [86] Sharding-based scaling system Authors proposed an efficient shard-formation protocol that assigns nodes into shards securely, and a distributed transaction protocol that can guard against malicious Byzantine fault coodinotors. [87] SSChain Authors proposed a non-reshuffling structure called SSChain, which supports both transaction sharding and state sharding while eliminating huge data-migration across shards. [88] Eumonia Authors proposed Eumonia, which is a permissionless parallel-chain protocol for realizing a global ordering of blocks. [89] Vulnerability of Sybil attacks Authors systematically analyzed the vulnerability of Sybil attacks in protocol Elastico. [90] n/2 BFT Sharding approach Authors proposed a new blockchain sharding approach that can tolerate up to 1/2 of the Byzantine nodes within a shard. [91] CycLedger Authors proposed a protocol CycLedger to pave a way towards scalability, security and incentive for sharding blockchains. Interoperability [92] Interoperability architecture Authors proposed a novel interoperability architecture that supports the cross-chain cooperations among multiple blockchains, and a novel Monitor Multiplexing Reading (MMR) method for the passive cross-chain communications. of multiple-chain [93] HyperService Authors proposed a programming platform that provides interoperability and programmability over multiple heterogeneous blockchains. systems [94] Protocol Move Authors proposed a programming model for smart-contract developers to create dApps that can interoperate and scale in a multiple-chain envrionment. [95] Crosscryptocurrency Tx protocol Authors proposed a decentralized cryptocurrency exchange protocol enabling crosscryptocurrency transactions based on smart contracts deployed on Ethereum. [16] Cross-chain comm. Authors conducted a systematic classification of cross-chain communication protocols. One of the critical bottlenecks of today's blockchain systems is the scalability. For example, the throughput of a blockchain is not scalable when the network size grows. To address this dilemma, a number of scalability approaches have been proposed. In this part, we conduct an overview of the most recent solutions with respect to Sharding techniques, interoperability among multiple blockchains, and other solutions. Some early-stage sharding blockchain protocols (e.g., Elastico) improve the scalability by enforcing multiple groups of committees work in parallel. However, this manner still requires a large amount of communication for verifying every transaction linearly increasing with the number of nodes within a committee. Thus, the benefit of sharding policy was not fully employed. As an improved solution, Zamani et al. [81] proposed a Byzantine-resilient sharding-based protocol, namely Rapidchain, for permissionless blockchains. Taking the advantage of block pipelining, RapidChain improves the throughput by using a sound intra-committee consensus. The authors also developed an efficient cross-shard verification method to avoid the broadcast messages flooding in the holistic network. To enforce the throughput scaling with the network size, Gao et al. [96] proposed a scalable blockchain protocol, which leverages both sharding and Proof-of-Stake consensus techniques. Their experiments were performed in an Amazon EC2-based simulation network. Although the results showed that the throughput of the proposed protocol increases following the network size, the performance was still not so high, for example, the maximum throughput was 36 transactions per second and the transaction latency was around 27 seconds. Aiming to improve the efficiency of cross-shard transactions, Amiri et al. [82] proposed a permissioned blockchain system named SharPer, which is strive for the scalability of blockchains by dividing and reallocating different data shards to various network clusters. The major contributions of the proposed SharPer include the related algorithm and protocol associated to such SharPer model. In the author's previous work, they have already proposed a permissioned blockchain, while in this paper the authors extended it by introducing a consensus protocol in the processing of both intra-shard and cross-shard transactions. Finally, SharPer was devised by adopting sharding techniques. One of the important contributions is that SharPer can be used in the networks where there are a high percentage of non-faulty nodes. Furthermore, this paper also contributes a flattened consensus protocol w.r.t the order of cross-shard transactions among all involved clusters. Considering that the Ethereum places each group of transactions on a shard by their account addresses, the workloads and complexity of transactions in shards are apparently unbalanced. This manner further damages the network throughput. To address this uneven problem, Kim et al. [83] proposed D-GAS, which is a dynamic load balancing mechanism for Ethereum shards. Using such D-GAS, the transaction workloads of accounts on each shard can be reallocated according to their gas consumption. The target is to maximize the throughput of those transactions. The evaluation results showed that the proposed D-GAS achieved at most a 12% superiority of transaction throughput and a 74% lower transaction latency comparing with other existing techniques. The random sharding strategy causes imbalanced performance gaps among different committees in a blockchain network. Those gaps yield a bottleneck of transaction throughput. Thus, Wang et al. [84] proposed a new sharding policy for blockchains named NRSS, which exploits node rating to assess network nodes according to their performance of transaction verifications. After such evaluation, all network nodes will be reallocated to different committees aiming at filling the previous imbalanced performance gaps. Through the experiments conducted on a local blockchain system, the results showed that NRSS improves throughput by around 32% under sharding techniques. Sharding has been proposed to mainly improve the scalability and the throughput performance of blockchains. A good sharding policy should minimize the cross-shard communications as much as possible. A classic design of sharding is the Transactions Sharding. However, such Transactions Sharding exploits the random sharding policy, which leads to a dilemma that most transactions are cross-shard. To this end, Nguyen et al. [85] proposed a new sharding paradigm differing from the random sharding, called OptChain, which can minimize the number of cross-shard transactions. The authors achieved their goal through the following two aspects. First they designed two metrics, named T2S-score (Transaction-to-Shard) and L2S-score (Latency-to-Shard), respectively. T2S-score aims to measure how likely Manuscript submitted to ACM a transaction should be placed into a shard, while L2S-score is used to measure the confirmation latency when placing a transaction into a shard. Next, they utilized a well-known PageRank analysis to calculate T2S-score and proposed a mathematical model to estimate L2S-score. Finally, how does the proposed OptChain place transactions into shards based on the combination of T2S and L2S scores? In brief, they introduced another metric composed of both T2S and L2S, called temporal fitness score. For a given transaction u and a shard S i , OptChain figures the temporal fitness score for the pair ⟨u, S i ⟩. Then, OptChain just puts transaction u into the shard that is with the highest temporal fitness score. Similar to [85] , Dang et al. [86] proposed a new shard-formation protocol, in which the nodes of different shards are re-assigned into different committees to reach a certain safety degree. In addition, they also proposed a coordination protocol to handle the cross-shard transactions towards guarding against the Byzantine-fault malicious coordinators. The experiment results showed that the throughput achieves a few thousands of TPS in both a local cluster with 100 nodes and a large-scale Google cloud platform testbed. Considering that the reshuffling operations lead to huge data migration in the sharding-based protocols, Chen et al. Although the existing sharding-based protocols, e.g., Elastico, OminiLedger and RapaidChain, have gained a lot of attention, they still have some drawbacks. For example, the mutual connections among all honest nodes require a big amount of communication resources. Furthermore, there is no an incentive mechanism driven nodes to participate in sharding protocol actively. To solve those problems, Zhang et al. [91] proposed CycLedger, which is a protocol designed for the sharding-based distributed ledger towards scalability, reliable security, and incentives. Such the proposed CycLedger is able to select a leader and a subset of nodes for each committee that handle the intra-shard consensus and the synchronization with other committees. A semi-commitment strategy and a recovery processing scheme were also proposed to deal with system crashing. In addition, the authors also proposed a reputation-based incentive policy to encourage nodes behaving honestly. Following the widespread adoption of smart contracts, the roles of blockchains have been upgraded from token exchanges into programmable state machines. Thus, the blockchain interoperability must evolve accordingly. To help realize such new type of interoperability among multiple heterogeneous blockchains, Liu et al. [93] proposed HyperService, which includes two major components, i.e., a programming framework allowing developers to create crosschain applications; and a universal interoperability protocol towards secure implementation of dApps on blockchains. The authors implemented a 35,000-line prototype to prove the practicality of HyperService. Using the prototype, the end-to-end delays of cross-chain dApps, and the aggregated platform throughput can be measured conveniently. In an ecosystem that consists of multiple blockchains, interoperability among those difference blockchains is an essential issue. To help the smart-contract developers build dApps, Fynn et al. [94] proposed a practical Move protocol that works for multiple blockchains. The basic idea of such protocol is to support a move operation enabling to move objects and smart contracts from one blockchain to another. Recently, to enable cross-cryptocurrency transactions, Tian et al. [95] proposed a decentralized cryptocurrency exchange strategy implemented on Ethereum through smart contracts. Additionally, a great number of studies of cross-chain communications are included in [16] , in which readers can find a systematic classification of cross-chain communication protocols. New Protocols [97] Ouroboros Praos Authors proposed a new secure Proof-of-stake protocol named Ouroboros Praos, which is proved secure in the semi-synchronous adversarial setting. [98] Tendermint Authors proposed a new BFT consensus protocol for the wide area network organized by the gossip-based P2P network under adversarial conditions. [73] Chu-ko-nu mining Authors proposed a novel proof-of-work scheme, named Chu-ko-nu mining, which incentivizes miners to create multiple blocks in different zones with only a single PoW mining. [99] Proof-of-Trust (PoT) Authors proposed a novel Proof-of-Trust consensus for the online services of crowdsourcing. New [100] StreamChain Authors proposed to shift the block-based distributed ledgers to a new paradigm of stream transaction processing to achieve a low end-to-end latencies without much affecting throughput. In Monoxide proposed by [73] , the authors have devised a novel proof-of-work scheme, named Chu-ko-nu mining. This new proof protocol encourages a miner to create multiple blocks in different zones simultaneously with a single PoW solving effort. This mechanism makes the effective mining power in each zone is almost equal to the level of the total physical mining power in the entire network. Thus, Chu-ko-nu mining increases the attack threshold for each zone to 50%. Furthermore, Chu-ko-nu mining can improve the energy consumption spent on mining new blocks because a lot of more blocks can be produced in each round of normal PoW mining. The online services of crowdsourcing face a challenge to find a suitable consensus protocol. By leveraging the advantages of the blockchain such as the traceability of service contracts, Zou et al. [99] proposed a new consensus protocol, named Proof-of-Trust (PoT) consensus, for crowdsourcing and the general online service industries. Basically, such PoT consensus protocol leverages a trust management of all service participants, and it works as a hybrid blockchain architecture in which a consortium blockchain integrates with a public service network. Conventionally, block-based data structure is adopted by permissionless blockchain systems as blocks can efficiently amortize the cost of cryptography. However, the benefits of blocks are saturated in today's permissioned blockchains since the block-processing introduces large batching latencies. To the distributed ledgers that are neither geo-distributed nor Pow-required, István et al. [100] proposed to shift the traditional block-based data structure into the paradigm of stream-like transaction processing. The premier advantage of such paradigm shift is to largely shrink the end-to-end latencies for permissioned blockchains. The authors developed a prototype of their concept based on Hyperledger Fabric. The results showed that the end-to-end latencies achieved sub-10 ms and the throughput was close to 1500 TPS. Permissioned blockchains have a number of limitations, such as poor performance, privacy leaking, and inefficient cross-application transaction handling mechanism. To address those issues, Amiri et al. [101] proposed CAPER, which a permissioned blockchain that can well deal with the cross-application transactions for distributed applications. In particular, CAPER constructs its blockchain ledger using DAG and handles the cross-application transactions by adopting three specific consensus protocols, i.e., a global consensus using a separate set of orders, a hierarchical consensus protocol, and a one-level consensus protocol. Then, Chang et al. [102] proposed an edge computing-based blockchain [105] architecture, in which edge-computing providers supply computational resources for blockchain miners. The authors then formulated a two-phase stackelberg game for the proposed architecture, aiming to find the Stackelberg equilibrium of the theoretical optimal mining scheme. Next, Zheng et al. [103] proposed a new infrastructure for practical PoW blockchains called AxeChain, which aims to exploit the precious computing power of miners to solve arbitrary practical problems submitted by system users. The authors also analyzed the trade-off between energy consumption and security guarantees of such AxeChain. This study opens up a new direction for pursing high energy efficiency of meaningful PoW protocols. With the non-linear (e.g., graphical) structure adopted by blockchain networks, researchers are becoming interested in the performance improvement brought by new data structures. To find insights under such non-linear blockchain systems, Chen et al. [104] performed a systematic analysis by taking three critical metrics into account, i.e., full verification, scalability, and finality-duration. The authors revealed that it is impossible to achieve a blockchain that enables those three metrics at the same time. Any blockchain designers must consider the trade-off among such three properties. The graphs are widely used in blockchain networks. For example, Merkel Tree has been adopted by Bitcoin, and several blockchain protocols, such as Ghost [106] , Phantom [107] , and Conflux [108] , constructed their blocks using the directed acyclic graph (DAG) technique. Different from those generalized graph structures, we review the most recent studies that exploit the graph theories for better understanding blockchains in this part. Since the transactions in blockchains are easily structured into graphs, the graph theories and graph-based data mining techniques are viewed as good tools to discover the interesting findings beyond the graphs of blockchain networks. Some representative recent studies are reviewed as follows. Leveraging the techniques of graph analysis, Chen et al. [109] characterized three major activities on Ethereum, i.e., money transfer, the creation of smart contracts, and the invocation of smart contracts. The major contribution of this paper is that it performed the first systematic investigation and proposed new approaches based on cross-graph analysis, which can address two security issues existing in Ethereum: attack forensics and anomaly detection. Particularly, w.r.t the graph theory, the authors mainly concentrated on the following two aspects: (1) Graph Construction: They identified four types of transactions that are not related to money transfer, smart contract creation, or smart contract invocation. (2) Graph Analysis: Then, they divided the remaining transactions into three groups according to the activities they triggered, i.e., money flow grahp (MFG), smart contract creation graph (CCG) and contract invocation graph (CIG). Via this manner, the authors delivered many useful insights of transactions that are helpful to address the security issues of Ethereum. Similarly, by processing Bitcoin transaction history, Akcora et al. [110] and Dixon et al. [111] modeled the transfer network into an extreme transaction graph. Through the analysis of chainlet activities [112] in the constructed graph, they proposed to use GARCH-based forecasting models to identify the financial risk of Bitcoin market for cryptocurrency users. An emerging research direction associated with blockchain-based cryptocurrencies is to understand the network dynamics behind graphs of those blockchains, such as the transaction graph. This is because people are wondering what the connection between the price of a cryptocurrency and the dynamics of the overlying transaction graph is. To answer such a question, Abay et al. [113] proposed Chainnet, which is a computationally lightweight method to learning the graph features of blockchains. The authors also disclosed several insightful findings. For example, it is the topological feature of transaction graph that impacts the prediction of Bitcoin price dynamics, rather than the degree distribution of the transaction graph. Furthermore, utilizing the Mt. Gox transaction history, Chen et al. [114] also exploited the graph-based data-mining approach to dig the market manipulation of Bitcoin. The authors constructed three graphs, i.e., extreme high graph (EHG), extreme low graph (ELG), and normal graph (NMG), based on the initial processing of transaction dataset. Then, they discovered many correlations between market manipulation patterns and the price of Bitcoin. On the other direction, based on address graphs, Victor et al. [115] studied the ERC20 token networks through analyzing smart contracts of Ethereum blockchain. Different from other graph-based approaches, the authors focused on their attention on the address graphs, i.e., token networks. With all network addresses, each token network is viewed as an overlay graph of the entire Ethereum network addresses. Similar to [109] , the authors presented the relationship between transactions by exploiting graph-based analysis, in which the arrows can denote the invoking functions between transactions and smart contracts, and the token transfers between transactions as well. The findings presented by this study help us have a well understanding of token networks in terms of time-varying characteristics, such as the usage patterns of the blockchain system. An interesting finding is that around 90% of all transfers stem from the top 1000 token contracts. That is to say, only less than 10% of token recipients have transferred their tokens. This finding is contrary to the viewpoint proposed by [116] , where Somin et al. showed that the full transfers seem to obey a power-law distribution. However, the study [115] indicated that those transfers in token networks likely do not follow a power law. The authors attributed such the observations to the following three possible reasons: 1) most of the token users don't have incentives to transfer their tokens. Instead, they just simply hold tokens; 2) the majority of inactive tokens are treated as something like unwanted spam; 3) a small portion, i.e., approximately 8%, of users intended to sell their tokens to a market exchange. Recently, Zhao et al. [117] explored the account creation, account vote, money transfer and contract authorization activities of early-stage EOSIO transactions through graph-based metric analysis. Their study revealed abnormal transactions like voting gangs and frauds. The latencies of block transfer and processing are generally existing in blockchain networks since the large number of miner nodes are geographically distributed. Such delays increase the probability of forking and the vulnerability to malicious attacks. Thus, it is critical to know how would the network dynamics caused by the block propagation latencies and the fluctuation of hashing power of miners impact the blockchain performance such as block generation rate. To find the connection between those factors, Papadis et al. [118] developed stochastic models to derive the blockchain evolution in a wide-area network. Their results showed us practical insights for the design issues of blockchains, for example, how to change the difficulty of mining in the PoW consensus while guaranteeing an expected block generation rate or an immunity level of adversarial attacks. The authors then performed analytical studies and simulations to evaluate the accuracy of their models. This stochastic analysis opens up a door for us to have a deeper understanding of dynamics in a blockchain network. Towards the stability and scalability of blockchain systems, Gopalan et al. [119] also proposed a stochastic model for a blockchain system. During their modeling, a structural asymptotic property called one-endedness was identified. The authors also proved that a blockchain system is one-ended if it is stochastically stable. The upper and lower bounds of the stability region were also studied. The authors found that the stability bounds are closely related to the conductance of the P2P blockchain network. Those findings are very insightful such that researchers can assess the scalability of blockchain systems deployed on large-scale P2P networks. Although Sharding protocol is viewed as a very promising solution to solving the scalability of blockchains and adopted by multiple well-known blockchains such as RapidChain [81] , OmniLedger [69] , and Monoxide [73] , the failure probability for a committee under Sharding protocol is still unknown. To fill this gap, Hafid et al. [120] [121] [122] proposed a stochastic model to capture the security analysis under Sharding-based blockchains using a probabilistic approach. With the proposed mathematical model, the upper bound of the failure probability was derived for a committee. In particular, three probability inequalities were used in their model, i.e., Chebyshev, Hoeffding, and Chvátal. The authors claim that the proposed stochastic model can be used to analyze the security of any Sharding-based protocol. In blockchain networks, several stages of mining processing and the generation of new blocks can be formulated as queueing systems, such as the transaction-arrival queue, the transaction-confirmation queue, and the block-verification queue. Thus, a growing number of studies are exploiting the queueing theory to disclose the mining and consensus mechanisms of blockchains. Some recent representative works are reviewed as follows. To develop a queueing theory of blockchain systems, Li et al. [123, 124] devised a batch-service queueing system to describe the mining and the creating of new blocks in miners' pool. For the blockchain queueing system, the authors exploited the type GI/M/1 continuous-time Markov process. Then, they derived the stable condition and the stationary probability matrix of the queueing system utilizing the matrix-geometric techniques. Then, viewing that the confirmation delay of Bitcoin transactions are larger than conventional credit card systems, Ricci et al. [125] proposed a theoretical framework integrating the queueing theory and machine learning techniques to have a deep understanding towards the transaction confirmation time. The reason the authors chose the queueing theory for their study is that a queueing model is suitable to see insights into how the different blockchain parameters affect the transaction latencies. Their measurement results showed that the Bitcoin users experience a delay that is slightly larger than the residual time of a block confirmation. Frolkova et al. [126] formulated the synchronization process of Bitcoin network as an infinite-server model. The authors derived a closed-form for the model that can be used to capture the queue stationary distribution. Furthermore, they also proposed a random-style fluid limit under service latencies. On the other hand, to evaluate and optimize the performance of blockchain-based systems, Memon et al. [128] Via graph analysis, authors extracted three major activities, i.e., money transfer, smart contracts creation, and smart contracts invocation. based mining [113] Features of transaction graphs Proposed an extendable and computationally efficient method for graph representation learning on Blockchains. Theories [114] Market manipulation patterns Authors exploited the graph-based data-mining approach to reveal the market manipulation evidence of Bitcoin. [117] Clustering coefficient, assortativity of Tx graph Authors exploited the graph-based analysis to reveal the abnormal transactions of EOSIO. Token networks [115] Token-transfer distributions Authors studied the token networks through analyzing smart contracts of Ethereum blockchain based on graph analysis. [110, 111] Extreme chainlet activity Authors proposed graph-based analysis models for assessing the financial investment risk of Bitoin. Blockchain network analysis [118] Block completion rates, and the probability of a successful adversarial attack Authors derived stochastic models to capture critical blockchain properties, and to evaluate the impact of blockchain propagation latency on key performance metrics. This study provides us useful insights of design issues of blockchain networks. Stability analysis [119] Time to consistency, cycle length, consistency fraction, age of information Authors proposed a network model which can identify the stochastic stability of blockchain systems. Failure probability analysis [120] [121] [122] Failure probability of a committee, sums of upper-bounded hypergeometric and binomial distributions for each epoch Authors proposed a probabilistic model to derive the security analysis under Sharding blockchain protocols. This study can tell how to keep the failure probability smaller than a defined threshold for a specific sharding protocol. Mining procedure and blockgeneration [123, 124] The average # of Tx in the arrival queue and in a block, and average confirmation time of Tx Authors developed a Makovian batch-service queueing system to express the mining process and the generation of new blocks in miners pool. Blockconfirmation time [125] The residual lifetime of a block till the next block is confirmed Authors proposed a theoretical framework to deeply understand the transaction confirmation time, by integrating the queueing theory and machine learning techniques. Synchronization process of Bitcoin network [126] Stationary queue-length distribution Authors proposed an infinite-server model with random fluid limit for Bitcoin network. Mining resources allocation [127] Mining resource for miners, queueing stability Authors proposed a Lyapunov optimization-based queueing analytical model to study the allocation of mining resources for the PoW-based blockchain networks. Blockchain's theoretical working principles [128] # of Tx per block, mining interval of each block, memory pool size, waiting time, # of unconfirmed Tx Authors proposed a queueing theory-based model to have a better understanding the theoretical working principle of blockchain networks. critical statistics metrics of blockchain networks, such as the number of transactions every new block, the mining interval of a block, transactions throughput, and the waiting time in memory pool, etc. Next, Fang et al. [127] proposed a queueing analytical model to allocate mining resources for the general PoW-based blockchain networks. The authors formulated the queueing model using Lyapunov optimization techniques. Based on such stochastic theory, a dynamic allocation algorithm was designed to find a trade-off between mining energy and queueing delay. Different from the aforementioned work [123] [124] [125] , the proposed Lyapunov-based algorithm does not need to make any statistical assumptions on the arrivals and services. For the people considering whether a blockchain system is needed for his/her business, a notable fact is that blockchain is not always applicable to all real-life use cases. To help analyze whether blockchain is appropriate to a specific application scenario, Wust et al. [129] provided the first structured analytical methodology and applied it to analyzing Authors proposed the first structured analytical methodology that can help decide whether a particular application system indeed needs a blockchain, either a permissioned or permissionless, as its technical solution. Exploration of [130] Temporal information and the multiplicity features of Ethereum transactions Authors proposed an analytical model based on the multiplex network theory for understanding Ethereum transactions. Ethereum transactions [131] Pending time of Ethereum transactions Authors conducted a characterization study of the Ethereum by focusing on the pending time, and attempted to find the correlation between pending time and fee-related parameters of Ethereum. Modeling the competition over multiple miners [132] Competing mining resources of miners of a cryptocurrency blockchain Authors exploited the Game Theory to find a Nash equilibria while peers are competing mining resources. A neat bound of consistency latency [133] Consistency of a PoW blockchain Authors derived a neat bound of mining latencies that helps understand the consistency of Nakamoto's blockchain consensus in asynchronous networks. Network connectivity [134] Consensus security Authors proposed an analytical model to evaluate the impact of network connectivity on the consensus security of PoW blockchain under different adversary models. How Ethereum responds to sharding [135] Balance among shards, # of Tx that would involve multiple shards, the amount of data relocated across shards Authors studied how sharding impact Ethereum by firstly modeling Ethereum through graph modeling, and then assessing the three metrics mentioned when partitioning the graph. Required properties of sharding protocols [136] Consistency and Scalability Authors proposed an analytical model to evaluate whether a protocol for sharded distributed ledgers fulfills necessary properties. Vulnerability by forking attacks [137] Hashrate power, net cost of an attack Authors proposed fine-grained vulnerability analytical model of blockchain networks incurred by intentional forking attacks taking the advantages of large deviation theory. Counterattack to double-spend attacks [70] Robustness parameter, vulnerability probability Authors studied how to defense and even counterattack the double-spend attacks in PoW blockchains. Limitations of PBFTbased blockchains [138] Performance of blockchain applications, Persistence, Possibility of forks Authors studied and identified several misalignments between the requirements of permissioned blockchains and the classic BFT protocols. three representative scenarios, i.e., supply chain management, interbank payments, and decentralized autonomous organizations. Although Ethereum has gained much popularity since its debut in 2014, the systematically analysis of Ethereum transactions still suffers from insufficient explorations. Therefore, Lin et al. [130] proposed to model the transactions using the techniques of multiplex network. The authors then devised several random-walk strategies for graph representation of the transactions network. This study could help us better understand the temporal data and the multiplicity features of Ethereum transactions. To better understand the network features of an Ethereum transaction, Sousa et al. [131] focused on the pending time, which is defined as the latency counting from the time a transaction is observed to the time this transaction is packed into the blockchain. The authors tried to find the correlations between such pending time with the fee-related parameters such as gas and gas price. Surprisingly, their data-driven empirical analysis results showed that the correlation between those two factors has no clear clue. This finding is counterintuitive. To achieve a consensus about the state of blockchains, miners have to compete with each other by invoking a certain proof mechanism, say PoW. Such competition among miners is the key module to public blockchains such as Bitcoin. To model the competition over multiple miners of a cryptocurrency blockchain, Altman et al. [132] exploited the Game Theory to find a Nash equilibria while peers are competing mining resources. The proposed approach help researchers well understand such competition. However, the authors also mentioned that they didn't study the punishment and cooperation between miners over the repeated games. Those open topics will be very interesting for future studies. To ensure the consistency of PoW blockchain in an asynchronous network, Zhao et al. [133] performed an analysis and derived a neat bound around 2µ ln(µ/ν ) , where µ + ν = 1, with µ and ν denoting the fraction of computation power dominated by the honest and adversarial miners, respectively. Such a neat bound of mining latencies is helpful to us to well understand the consistency of Nakamoto's blockchain consensus in asynchronous networks. Bitcoin's consensus security is built upon the assumption of honest-majority. Under this assumption, the blockchain system is thought secure only if the majority of miners are honest while voting towards a global consensus. Recent researches believe that network connectivity, the forks of a blockchain, and the strategy of mining are major factors that impact the security of consensus in Bitcoin blockchain. To provide pioneering concrete modelings and analysis, Xiao et al. [134] proposed an analytical model to evaluate the network connectivity on the consensus security of PoW blockchains. To validate the effectiveness of the proposed analytical model, the authors applied it to two adversary scenarios, i.e., honest-but-potentially-colluding, and selfish mining models. Although Sharding is viewed as a prevalent technique for improving the scalability to blockchain systems, several essential questions are: what we can expect from and what price is required to pay for introducing Sharding technique to Ethereum? To answer those questions, Fynn et al. [135] studied how sharding works for Ethereum by modeling Ethereum into a graph. Via partitioning the graph, they evaluated the trade-off between the edge-cut and balance. Several practical insights have been disclosed. For example, three major components, e..g, computation, storage and bandwidth, are playing a critical role when partitioning Ethereum; A good design of incentives is also necessary for adopting sharding mechanism. As mentioned multiple times, sharding technique is viewed as a promising solution to improving the scalability of blockchains. However, the properties of a sharded blockchain under a fully adaptive adversary are still unknown. To this end, Avarikioti et al. [136] defined the consistency and scalability for sharded blockchain protocol. The limitations of security and efficiency of sharding protocols were also derived. Then, they analyzed these two properties on the context of multiple popular sharding-based protocols such as OmniLedger, RapidChain, Elastico, and Monoxide. Several interesting conclusions have been drawn. For example, the authors thought that Elastico and Momoxide failed to guarantee the balance between consistency and scalability properties, while OmniLedger and RapidChain fulfill all requirements of a robust sharded blockchain protocol. Forking attacks has become the normal threats faced by the blockchain market. The related existing studies mainly focus on the detection of such attacks through transactions. However, this manner cannot prevent the forking attacks from happening. To resist the forking attacks, Wang et al. [137] studied the fine-grained vulnerability of blockchain networks caused by intentional forks using the large deviation theory. This study can help set the robustness parameters for a blockchain network since the vulnerability analysis provides the correlation between robust level and the vulnerability probability. In detail, the authors found that it is much more cost-efficient to set the robust level parameters than to spend the computational capability used to lower the attack probability. The existing economic analysis [139] reported that the attacks towards PoW mining-based blockchain systems can be cheap under a specific condition when renting sufficient hashrate capability. Moroz et al. [70] studied how to defense the double-spend attacks in an interesting reverse direction. The authors found that the counterattack of victims can lead to a classic game-theoretic War of Attrition model. This study showed us the double-spend attacks on some PoW-based blockchains are actually cheap. However, the defense or even counterattack to such double-spend attacks is possible when victims are owing the same capacity as the attacker. Although BFT protocols have attracted a lot of attention, there are still a number of fundamental limitations unaddressed while running blockchain applications based on the classical BFT protocols. Those limitations include one related to low performance issues, and two correlated to the gaps between the state machine replication and blockchain models (i.e., the lack of strong persistence guarantees and the occurrence of forks). To identify those limitations, Bessani et al. [138] first studied them using a digital coin blockchain App called SmartCoin, and a popular BFT replication library called BFT-SMART, then they discussed how to tackle these limitations in a protocol-agnostic manner. The authors also implemented an experimental platform of permissioned blockchain, namely SmartChain. Their evaluation results showed that SmartChain can address the limitations aforementioned and significantly improve the performance of a blockchain application. Ref. Cryptojacking [140] Hardware performance counters Authors proposed a machine learning-based solution to prevent cryptojacking attacks. detection [141] Various system resource utilization Authors proposed an in-browser cryptojacking detection approach (CapJack), based on the latest CapsNet. Marketmanipulation mining [114] Various graph characteristics of transaction graph Authors proposed a mining approach using the exchanges collected from the transaction networks. Predicting volatility of Bitcoin price [111] Various graph characteristics of extreme chainlets Authors proposed a graph-based analytic model to predict the intraday financial risk of Bitcoin market. Money-laundering detection [142] Various graph characteristics of transaction graph Authors exploited machine learning models to detect potential money laundering activities from Bitcoin transactions. Ponzi-scheme [143] Factors that affect scam persistence Authors analyzed the demand and supply perspectives of Ponzi schemes on Bitcoin ecosystem. detection [144, 145] Account and code features of smart contracts Authors detected Ponzi schemes for Ethereum based on data mining and machine learning approaches. Design problem of cryptoeconomic systems [146] Price of XNS token, Subsidy of App developers Authors presented a practical evidence-based example to show how data science and stochastic modeling can be applied to designing cryptoeconomic blockchains. Pricing mining hardware [147] Miner revenue, ASIC value Authors studied the correlation between the price of mining hardware (ASIC) and the value volatility of underlying cryptocurrency. in mining. Thus, any web users face severe risks from the cryptocurrency-hungry hackers. For example, the cryptojacking attacks [148] have raised growing attention. In such type of attacks, a mining script is embedded secretly by a hacker without notice from the user. When the script is loaded, the mining will begin in the background of the system and a large portion of hardware resources are requisitioned for mining. To tackle the cryptojacking attacks, Tahir et al. [140] proposed a machine learning-based solution, which leverages the hardware performance counters as the critical features and can achieve a high accuracy while classifying the parasitic miners. The authors also built their approach into a browser extension towards the widespread real-time protection for web users. Similarly, Ning et al. [141] proposed CapJack, which is an in-browser cryptojacking detector based on deep capsule network (CapsNet) [149] technology. As mentioned previously, to detect potential manipulation of Bitcoin market, Chen et al. [114] proposed a graph-based mining to study the evidence from the transaction network built based on Mt. Gox transaction history. The findings of this study suggests that the cryptocurrency market requires regulation. To predict drastic price fluctuation of Bitcoin, Dixon et al. [111] studied the impact of extreme transaction graph (ETG) activity on the intraday dynamics of the Bitcoin prices. The authors utilized chainlets [112] (sub graphs of transaction graph) for developing their predictive models. Manuscript submitted to ACM [151] mentioned that money laundering conducted in the underground market can be detected using the Bitcoin mixing services. However, they didn't present an essential anti-money laundering strategy in their paper. In contrast, utilizing a transaction dataset collected over three years, Hu et al. [142] performed in-depth detection for discovering money laundering activities on Bitcoin network. To identify the money laundering transactions from the regular ones, the authors proposed four types of classifiers based on the graph features appeared on the transaction graph, i.e., immediate neighbors, deepwalk embeddings, node2vec embeddings and decision tree-based. It is not common to introduce data science and stochastic simulation modelings into the design problem of cryptoeconomic engineering. Laskowski et al. [146] presented a practical evidencebased example to show how this manner can be applied to designing cryptoeconomic blockchains. Yaish et al. [147] discussed the relationship between the cryptocurrency mining and the market price of the special hardware (ASICs) that supports PoW consensus. The authors showed that the decreasing volatility of Bitcoin's price has a counterintuitive negative impact to the value of mining hardware. This is because miners are not financially incentivized to participate in mining, when Bitcoin becomes widely adopted thus making its volatility decrease. This study also revealed that a mining hardware ASIC could be imitated by bonds and underlying cryptocurrencies such as bitcoins. Although diverse blockchains have been proposed in recent years, very few efforts have been devoted to measuring the performance of different blockchain systems. Thus, this part reviews the representative studies of performance measurements for blockchains. The measurement metrics include throughput, security, scalability, etc. As a pioneer work in this direction, Gervais et al. [152] proposed a quantitative framework, using which they studied the security and performance of several PoW blockchains, such as Bitcoin, Litecoin, Dogecoin and Ethereum. The authors focused on multiple metrics of security model, e.g., stale block rate, mining power, mining costs, the number of block confirmations, propagation ability, and the impact of eclipse attacks. They also conducted extensive simulations for the four blockchains aforementioned with respect to the impact of block interval, the impact of block size, and throughput. Via the evaluation of network parameters about the security of PoW blockchains, researchers can compare the security performance objectively, and thus help them appropriately make optimal adversarial strategies and the security provisions of PoW blockchains. General mining-based blockchains, e.g., Bitcoin and Ethereum TPS, the overheads of cross-zone transactions, the confirmation latency of transactions, etc. Monoxide was implemented utilizing C++. RocksDB was used to store blocks and Tx. The real-world testing system was deployed on a distributed configuration consisting of 1200 virtual machines, with each owing 8 cores and 32 GB memory. In total 48,000 blockchain nodes were exploited in the testbed. [74] General blockchains Throughput and confirmation latency, scalability under different # of clients, forking rate, and resource utilization (CPU, network bandwidth) Prism testbed is deployed on Amazon EC2 instances each with 16 CPU cores, 16 GB RAM, 400 GB NVMe SSD, and a 10 Gbps network interface. In total 100 Prism client instances are connected into a topology in random 4-regular graph. [75] Ethereum Nasir et al. [153] conducted performance measurements and discussion of two versions of Hyperledger Fabric. The authors focused on the metrics including execution time, transaction latency, throughput and the scalability versus the number of nodes in blockchain platforms. Several useful insights have been revealed for the two versions of Hyperledger Fabric. As already mentioned previously in [73] , the authors evaluated their proposed Monoxide w.r.t the metrics including the scalability of TPS as the number of network zones increase, the overhead of both cross-zone transactions and storage size, the confirmation latency of transactions, and the orphan rate of blocks. In [74] , the authors performed rich measurements for their proposed new blockchain protocol Prism under limited network bandwidth and CPU resources. The performance evaluated includes the distribution of block propagation delays, the relationship between block size and mining rate, block size versus assembly time, the expected time to reach consensus on block hash, the expected time to reach consensus on blocks, etc. Later, Zheng et al. [154] proposed a scalable framework for monitoring the real-time performance blockchain systems. This work has evaluated four popular blockchain systems, i.e., Ethereum, Parity [158] , Cryptape Inter-enterprise Trust Automation (CITA) [159] and Hyperledger Fabric [160] i) data analysis based on off-chain data to provide off-chain user behavior for blockchain developers, ii) exploring new features of EOSIO data that are different from those of Ethereum, and iii) conducting a joint analysis of EOSIO with other blockchains. Kalodner et al. [164] proposed BlockSci, which is designed as an open-source software platform for blockchain analysis. Under the architecture of BlockSci, the raw blockchain data is parsed to produce the core blockchain data including transaction graph, indexes and scripts, which are then provided to the analysis library. Together with the auxiliary data including P2P data, price data and user tags, a client can either directly query or read through a Jupyter notebook interface. To evaluate the performance of private blockchains, Dinh et al. [155] proposed a benchmarking framework, named Blockbench, which can measure the data processing capability and the performance of various layers of a blockchain system. Using such Blockbench, the authors then performed detailed measurements and analysis of three blockchains, i.e., Ethereum, Parity and Hyperledger. The results disclosed some useful experiences of those three blockchain systems. For example, today's blockchains are not scalable w.r.t data processing workloads, and several bottlenecks should be considered while designing different layers of blockchain in the software engineering perspective. Ethereum has received enormous attention on the mining challenges, the analytics of smart contracts, and the management of block mining. However, not so many efforts have been spent on the information dissemination in authors also made this simulator open-source on Github. In this section, we envision the open issues and promising directions for future studies. 6.1.3 Cross-Shard Performance . Although a number of committee-based sharding protocols [69, 73, 81, 165] have been proposed, those protocols can only endure at most 1/3 adversaries. Thus, more robust byzantine agreement protocols need to be devised. Furthermore, all the sharding-based protocols incur additional cross-shard traffics and latencies because of the cross-shard transactions. Therefore, the cross-shard performance in terms of throughput, latency and other metrics, has to be well guaranteed in future studies. On the other hand, the cross-shard transactions are inherent for the cross-shard protocols. Thus, the pros and cons of such the correlation between different shards are worthy investigating using certain modelings and theories such as graph-based analysis. 6.1.4 Cross-Chain Transaction Accelerating Mechanisms . On cross-chain operations, [92] is essentially a pioneer step towards practical blockchain-based ecosystems. Following this roadmap paved by [92] , we are exciting to anticipate the subsequent related investigations will appear soon in the near future. For example, although the inter-chain transaction experiments achieve an initial success, we believe that the secure cross-chain transaction accelerating mechanisms are still on the way. In addition, further improvements are still required for the interoperability among multiple blockchains, such as decentralized load balancing smart contracts for sharded blockchains. Manuscript submitted to ACM 6.1.5 Ordering Blocks for Multiple-Chain Protocols . Although multiple-chain techniques can improve the throughput by exploiting the parallel mining of multiple chain instances, how to construct and manage the blocks in all chains in a globally consistent order is still a challenge to the multiple-chain based scalability protocols and solutions. 6.1.6 Hardware-assisted Accelerating Solutions for Blockchain Networks. To improve the performance of blockchains, for example, to reduce the latency of transaction confirmation, some advanced network technologies, such as RDMA (Remote Direct Memory Access) and high-speed network cards, can be exploited in accelerating the data-access among miners in blockchain networks. 6.1.7 Performance Optimization in Different Blockchain Network Layers . The blockchain network is built over the P2P networks, which include several typical layers, such as mac layer, routing layer, network layer, and application layer. The BFT-based protocols are essentially working for the network layer. In fact, performance improvements can be achieved by proposing various protocols, algorithms, and theoretical models for other layers of the blockchain network. 6.1.8 Blockchain-assisted BigData Networks. Although big data and blockchain have several performance metrics that are contrary to each other. For example, big data is a centralized management technology with an emphasize on the privacy-preserving oriented to diverse computing environments. The data processed by big data technology should ensure nonredundancy and unstructured architecture in a large-scale computing network. In contrast, blockchain technology builds on a decentralized, transparent and immutable architecture, in which data type is simple, data is structured and highly redundant. Furthermore, the performance of blockchains require scalability and the off-chain computing paradigm. Thus, how to integrate those two technologies together and pursue the mutual benefit for each other is an open issue that is worthy in-depth studies. For example, the potential research topics include how to design a suitable new blockchain architecture for big data technologies, and how to break the isolated data islands using blockchains while guaranteeing the privacy issues of big data. • Exploiting more general queueing theories to capture the real-world arrival process of transactions, mining new blocks, and other queueing-related blockchain phases. • Performing priority-based service policies while dealing with transactions and new blocks, to meet a predefined security or regulation level. • Developing more general probabilistic models to characterize the correlations among the multiple performance parameters of blockchain systems. 6.2.2 Privacy-Preserving for Blockchains. From the previous overview, we observe that most of the existing works under this category are discussing the blockchain-based security and privacy-preserving applications. The fact is that the security and privacy are also the critical issues of the blockchain itself. For example, the privacy of transactions could be hacked by attackers. However, dedicated studies focusing on those issues are still insufficient. Mechanisms for Malicious Miners. The Cryptojacking Miners are reportedly existing in web browsers according to [140] . This type of malicious codes is commandeering the hardware resources such as computational capability and memory of web users. Thus, the anti-cryptojacking mechanisms and strategies are necessary to develop for protecting normal browser users. 6.2.4 Security Issues of Cryptocurrency Blockchains. The security issues of cryptocurrency blockchains, such as double-spend attacks, frauds in smart contracts, have arisen growing attention from both industrial and academic fields. However, little efforts have been committed to the theoretical investigations towards the security issues of cryptocurrency blockchains. For example, the exploration of punishment and cooperation between miners over multiple chains is an interesting topic for cryptocurrency blockchains. Thus, we expect to see broader perspectives of modeling the behaviors of both attackers and counterattackers in the context of monetary blockchain attacks. To most of the beginners in the field of the blockchain, they have a dilemma about lack of powerful simulation/emulation tools for verifying their new ideas or protocols. Therefore, the powerful simulation/emulation platforms that are easy to deploy scalable testbeds for the experiments would be very helpful to the research community. Through a brief review of state-of-the-art blockchain surveys at first, we found that a dedicated survey focusing on the theoretical modelings, analytical models and useful experiment tools for blockchains is still missing. To fill this gap, we then conducted a comprehensive survey of the state-of-the-art on blockchains, particularly in the perspectives of theories, modelings, and measurement/evaluation tools. The taxonomy of each topic presented in this survey tried to convey the new protocols, ideas, and solutions that can improve the performance of blockchains, and help people better understand the blockchains in a further level. We believe our survey provides a timely guidance on the theoretical insights of blockchains for researchers, engineers, educators, and generalized readers. Survey of consensus protocols on blockchain applications Blockchain consensus algorithms: the state of the art and future trends A survey on consensus mechanisms and mining management in blockchain networks Sok: A consensus taxonomy in the blockchain era A survey about consensus algorithms used in blockchain A survey on consensus mechanisms and mining strategy management in blockchain networks SoK: Consensus in the Age of Blockchains A survey of distributed consensus protocols for blockchain networks A survey of attacks on ethereum smart contracts Blockchain-Based Smart-Contract Languages: A Systematic Literature Review An overview on smart contracts: Challenges, advances and platforms Sok: Sharding on blockchain Survey: Sharding in blockchains Research on scalability of blockchain technology: Problems and methods Solutions to scalability of blockchain: A survey Sok: Communication across distributed ledgers A systematic literature review of blockchain cyber security A survey of blockchain from security perspective A survey of blockchain technology on security, privacy, and trust in crowdsourcing services The security of big data in fog-enabled iot applications including blockchain: a survey A survey on privacy protection in blockchain system A comprehensive survey on blockchain: Working, security analysis, privacy threats and potential applications Blockchain data analysis: A review of status, trends and challenges Dissecting ponzi schemes on ethereum: identification, analysis, and impact Blockchain for cloud exchange: A survey Blockchain for ai: review and open research challenges Blockchain intelligence: When blockchain meets artificial intelligence When machine learning meets blockchain: A decentralized, privacy-preserving and secure design Blockchain and machine learning for communications and networking systems A survey on blockchain: A game theoretical perspective Blockchain security in cloud computing: Use cases, challenges, and solutions When mobile blockchain meets edge computing Integrated blockchain and edge computing systems: A survey, some research issues and challenges Blockchain for 5g and beyond networks: A state of the art survey Blockchains and Smart Contracts for the Internet of Things Applications of blockchains in the internet of things: A comprehensive survey A review on the use of blockchain for the internet of things Internet of things security: A top-down survey Blockchain and iot integration: A systematic survey Blockchain for internet of things: A survey Survey on blockchain for internet of things Integration of blockchain and cloud of things: Architecture, applications and challenges Blockchain for the internet of things: Present and future When internet of things meets blockchain: Challenges in distributed consensus Blockchain technology toward green iot: Opportunities and challenges A survey of iot applications in blockchain systems: Architecture, consensus, and traffic modeling Blockchain applications for industry 4.0 and industrial iot: A review Edge intelligence and blockchain empowered 5g beyond for the industrial internet of things Applications of blockchain in unmanned aerial vehicles: A review Blockchain: A survey on functions, applications and open issues A systematic literature review of blockchain-based applications: current status, classification and open issues Blockchain in agriculture: A systematic literature review Security and privacy for green iot based agriculture review blockchain solutions and challenges Deployment of blockchain technology in software defined networks: A survey Blockchain for business applications: A systematic literature review A survey of blockchain technology applied to smart cities: Research issues and challenges Blockchain in smart grids: A review on different use cases Blockchain technology for smart grids: Decentralized nist conceptual model When blockchain meets distributed file systems: An overview, challenges, and open issues Blockchain in Space Industry: Challenges and Solutions Blockchain and AI-based Solutions to Combat Coronavirus (COVID-19)-like Epidemics: A Survey Blockchain: the state of the art and future trends An overview of blockchain technology: Architecture, consensus, and future trends Blockchain challenges and opportunities: A survey Blockchain and cryptocurrencies: Model, techniques, and applications Core Concepts, Challenges, and Future Directions in Blockchain: A Centralized Tutorial Bitcoin: A peer-to-peer electronic cash system A secure sharding protocol for open blockchains Omniledger: A secure, scale-out, decentralized ledger via sharding Double-Spend Counterattacks: Threat of Retaliation in Proof-of-Work Systems Ethereum: A secure decentralised generalised transaction ledger Accel: Accelerating the bitcoin blockchain for high-throughput, low-latency applications Monoxide: Scale out Blockchains with Asynchronous Consensus Zones Prism: Scaling bitcoin by 10,000 x Garet: improving throughput using gas consumption-aware relocation in ethereum sharding environments Erasure code-based low storage blockchain node Jidar: A jigsaw-like data reduction approach without trust assumptions for bitcoin system Segment blockchain: A size reduced storage mechanism for blockchain On availability for blockchain-based systems Selecting reliable blockchain peers via hybrid blockchain reliability prediction Rapidchain: Scaling blockchain via full sharding Sharper: Sharding permissioned blockchains over network clusters Gas consumption-aware dynamic load balancing in ethereum sharding environments A Node Rating Based Sharding Scheme for Blockchain Optchain: optimal transactions placement for scalable blockchain sharding Towards scaling blockchain systems via sharding Sschain: A full sharding protocol for public blockchain without data migration overhead Eunomia: A Permissionless Parallel Chain Protocol Based on Logical Clock On the feasibility of sybil attacks in shard-based permissionless blockchains An n/2 byzantine node tolerate blockchain sharding approach Cycledger: A scalable and secure parallel protocol for distributed ledger via sharding Towards a Novel Architecture for Enabling Interoperability amongst Multiple Blockchains Hyperservice: Interoperability and programmability across heterogeneous blockchains Smart Contracts on the Move Enabling cross-chain transactions: A decentralized cryptocurrency exchange protocol Scalable Blockchain Protocol Based on Proof of Stake and Sharding Ouroboros praos: An adaptively-secure, semi-synchronous proof-of-stake blockchain The latest gossip on bft consensus A proof-of-trust consensus protocol for enhancing accountability in crowdsourcing services Streamchain: Do blockchains need blocks Caper: a cross-application permissioned blockchain Incentive mechanism for edge computing-based blockchain AxeChain: A Secure and Decentralized blockchain for solving Easily-Verifiable problems Nonlinear blockchain scalability: a game-theoretic perspective Credit-based payments for fast computing resource trading in edge-assisted internet of things Secure high-rate transaction processing in bitcoin Phantom: A scalable blockdag protocol Scaling nakamoto consensus to thousands of transactions per second Understanding ethereum via graph analysis Bitcoin risk modeling with blockchain graphs Blockchain analytics for intraday financial risk modeling Forecasting bitcoin price with graph chainlets Chainnet: Learning on blockchain graphs with topological features Market manipulation of bitcoin: Evidence from mining the mt. gox transaction network Measuring ethereum-based erc20 token networks Network analysis of erc20 tokens trading on ethereum blockchain Exploring eosio via graph characterization Stochastic models and wide-area network measurements for blockchain design and analysis Stability and Scalability of Blockchain Systems A probabilistic security analysis of sharding-based blockchain protocols A methodology for a probabilistic security analysis of sharding-based blockchain protocols New mathematical model to analyze security of sharding-based blockchain protocols Blockchain queue theory Markov processes in blockchain systems Learning blockchain delays: a queueing theory approach A bitcoin-inspired infinite-server model with a random fluid limit Toward low-cost and stable blockchain networks Simulation model for blockchain systems using queuing theory Do you need a blockchain? Modeling and understanding ethereum transaction records via a complex network approach An Analysis of the Fees and Pending Time Correlation in Ethereum Blockchain competition between miners: a game theoretic perspective An analysis of blockchain consistency in asynchronous networks: Deriving a neat bound Modeling the Impact of Network Connectivity on Consensus Security of Proof-of-Work Blockchain Challenges and pitfalls of partitioning blockchains Divide and Scale: Formalization of Distributed Ledger Sharding Protocols Corking by forking: Vulnerability analysis of blockchain From byzantine replication to blockchain: Consensus is only the beginning The economic limits of bitcoin and the blockchain The browsers strike back: Countering cryptojacking and parasitic miners on the web CapJack: Capture In-Browser Crypto-jacking by Deep Capsule Network through Behavioral Analysis Characterizing and detecting money laundering activities on the bitcoin network Analyzing the bitcoin ponzi scheme ecosystem Detecting ponzi schemes on ethereum: Towards healthier blockchain technology Exploiting blockchain data to detect smart ponzi schemes on ethereum Evidence based decision making in blockchain economic systems: From theory to practice Pricing ASICs for Cryptocurrency Mining A first look at browser-based cryptojacking Dynamic routing between capsules Data mining for detecting bitcoin ponzi schemes Money laundering in the bitcoin network: Perspective of mixing services On the security and performance of proof of work blockchains Performance analysis of hyperledger fabric platforms A detailed and real-time performance monitoring framework for blockchain systems Blockbench: A framework for analyzing private blockchains Measuring Ethereum Network Peers Local bitcoin network simulator for performance evaluation using lightweight virtualization Parity documentation Cita technical whitepaper Hyperledger fabric: a distributed operating system for permissioned blockchains Performance Monitoring Xblock-eth: Extracting and exploring blockchain data from etherem Xblock-eos: Extracting and exploring blockchain data from eosio BlockSci: Design and applications of a blockchain analysis platform The honey badger of bft protocols