Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center Networks: A Survey https://doi.org/10.3991/ijim.v12i1.7573 Ahmad Nahar Quttoum The Hashemite University, Jordan Quttoum@hu.edu.jo Abstract—Today’s data center networks employ expensive networking equipments in associated structures that were not designed to meet the increas- ing requirements of the current large-scale data center services. Limitations that vary between reliability, resource utilization, and high costs are challenging. The era of cloud computing represents a promise to enable large-scale data cen- ters. Computing platforms of such cloud service data centers consist of large number of commodity low-price servers that, with a theme of virtualization on top, can meet the performance of the expensive high-level servers at only a fraction of the price. Recently, the research in data center networks started to evolve rapidly. This opened the path for addressing many of its design and management challenges, these like scalability, reliability, bandwidth capacities, virtual machines’ migration, and cost. Bandwidth resource fragmentation limits the network agility, and leads to low utilization rates, not only for the band- width resources, but also for the servers that run the applications. With Traffic Engineering methods, managers of such networks can adapt for rapid changes in the network traffic among their servers, this can help to provide better re- source utilization and lower costs. The market is going through exciting chang- es, and the need to run demanding-scale services drives the work toward cloud networks. These networks that are enabled by the notation of autonomic man- agement, and the availability of commodity low-price network equipments. This work provides the readers with a survey that presents the management challenges, design and operational constraints of the cloud-service data center networks.. Keywords—Cloud-based Data Center Networks; Structures of Data Center Networks; Network Management; Routing in Data Center Networks. 1 Introduction Over the last few decades, we lived and still living a huge Internet era and a big rise in the Web-based technologies that drive the theme of data centers to be more strategic than ever. Data Center Networks (DCNs) are mainly proposed to provide appropriate network structures with associated protocols that can interconnect differ- 36 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… ent servers holding varying applications, all together to act as a one single network [1]. In many organizations, the heartbeat of their business lies in data centers, where different parties (i.e. employees, partners, and customers) physically rely on the same data and network resources of a single DCN to interact, collaborate, and create ser- vices. As a consequence, such a theme receives a great attention from the Information technology (IT) specialists to enhance business processes, accelerate change, and improve productivity. Managers of DCNs face several challenges in satisfying such objectives, while demands on their networks are growing rapidly, and the needs be- came emerging to meet the economic and technical growth we are living nowadays. Mainly, an efficient DCN should provide (1) balanced network capacities [2]; (2) low-cost equipments; (3) high degree of scalability; and (4) reliability, where DCNs must be reliable with a substantial level of tolerance against network failures. There- fore, for DCNs managers, the challenge now is: (1) to efficiently utilize the network resources and maximize the number of provided services using the same amount of resources, while maintaining a reliable network state that is robust enough to link or server faults; (2) to provide scalable cost-effective interconnection structures that can accommodate large servers’ populations, along with efficient bidirectional bandwidth capacities between the network components. Today, most institutions still build redundant sites as backups, and usually, data on such secondary sites are manually replicated and managed. Although such backup sites represent an insurance policy in the case of failure, they also represent a non performing asset at most of the time [3]. This is considered a waste of time and pow- er. However, by introducing the concept of virtualization [4], resources of a DCN and its backup sites can be turned to ongoing available resources that can function in distributed scenarios. Regardless of the location, with such virtualization scheme, DCNs can provide lower costs, with higher performance and better reliability for its data and applications. In this direction, research in cloud service DCNs is tack- ling the issue of improving the services provided by DCNs. Existing interconnection structures, routing protocols and Traffic Engineering techniques, are all in the way to support virtualization management schemes. DCNs interconnection structures play a great role in overcoming the aforemen- tioned challenges, and provide for better virtualized cloud services while reducing the costs and networks’ failure probability. These structures which define how the net- work components (i.e. servers, switches, transits, links) to be interconnected, and the characteristics of each component. Traditional interconnection structures usually come in the form of hierarchical trees that interconnects a set of connecting devices through a set of links [5]. The specifications and characteristics of such elements may vary, and hence, both performance and cost of the whole DCN may also get affected. Routing is another crucial player in exploring the capacities of the DCNs structures [6]. Hence, several DCN routing protocols have been proposed in the literature. In general, such protocols differ from that of the Internet, where in DCNs, the routing protocols are specially designed to accommodate the DCNs topologies. In general, most basic routing schemes seek routes between any pair of network nodes with cer- tain conditions (e.g. shortest routes, or other traffic metrics), in DCNs, routing is a bit iJIM ‒ Vol. 12, No. 1, 2018 37 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… more sophisticated where it requires further constraints to be taken into consideration like energy and throughput. As this can be considered as a Traffic Engineering (TE) problem, the focus in DCNs is on the internal routing schemes (intra-routing), since most if the communication patterns of a DCN are internal ones [7]. Therefore, same in interconnection structures, the design of TE solutions in DCNs should take into account the principles of reliability, load-balancing, and energy ef- ficiency. In this study, we survey the state of the art of the research propositions that targeted the theme of DCNs through the last few years. 1.1 Paper Organization The reminder of this survey is organized as follows: Section 2 discusses the cloud service DCNs, while Section 3 presents and compares the different interconnection DCNs structures. In turn, Section 4 provides an overview of the routing protocols and the TE techniques in DCNs. Finally, Directions for Open Issues and Future Research are presented in Section 5, and then Section 6 concludes the paper. 2 Cloud Service Data Center Networks Within the IT community, cloud computing has emerged as a stunning theme that provides new management schemes to accommodate the growing challenge and the dynamic change in service demands, in efficient and cost-effective ways [8]. Relying on the traditional networks is not satisfying any more, where nowadays the trends are all toward dynamic scalable networks that can efficiently satisfy the changing de- mands and the varying workloads. The interest in cloud computing is increasing, however, there still a kind of confusion in many areas as to what does cloud com- puting really mean? How it differs from the traditional enterprise networks? What are the benefits of adopting such a theme?, and what about the risks or side-effects if it is applied for the next generation management technologies? 2.1 What is Cloud Computing? The theme of cloud computing can be basically defined as a computing style that employs high-level IT resources in scalable-virtualized scheme, in order to provide a wide platform for computing services [9]. Cloud networks deliver services rather than computing products, where the employed resources, software, and information are provided to the end-stations (i.e. users) as utilities. Hence, the users do not need to know how the network is implemented, how it is managed, or even what technologies are used. What concerns them is only that they have access to a reliable computing systems that can meet their applications requirements in a cost-effective way. Such architectures facilitate a dynamic on-demand access to a shared pool of reliable and highly available network resources, with easier management, and a pay-per-use pric- ing scheme! 38 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… 2.2 Cloud vs. Traditional Computing Architectures Fairly recently, the traditional distributed computing architectures were dominant in supporting most enterprise IT services. In such architectures, network resources are physically partitioned into several portions assigned to exclusively serve certain ap- plications. This approach might serve well and provide for good performance, howev- er, it requires significant investments and accurate forecasting techniques to efficient- ly utilize the provisioned network resources in a way to reduce the corresponding resource costs. On the other hand, the emergence of the virtualization technologies provides new management methodologies, these methodologies that allow high-levels of autonomic management for virtually partitioned resources [10]. With such virtual- ization scheme, as depicted in Figure 1, DCNs’ managers started to have the ability to utilize their resources better and maximize the provided services’ agility. Accordingly, by virtually partition the resources of their physical server machines into several Virtual Machines (VMs), DCNs’ managers could consolidate varying applications all at once in the same physical server machine [11], [12]. In- deed, in a cloud service DCN that contains thousands of physical server machines, with a modest amount of virtualization applied on top, such network can provide capacities for millions of end nodes running varying applications. Distributing the different applications over the VMs is done in a way that enables better utilization of the whole resources in the hosting physical machine. Resources like CPU, memory, and disk space. Hence, applications that use more CPU are consolidated with others that use less CPU but more memory or disk spaces, and so on [13]. The success of such virtualization technologies opened the path for developing reliable virtualized data center architectures [14] called cloud data center networks. Through such net- works, the physical resources are leveraged to provide agile, and a scalable on- demand access to a pool of different services and IT applications. Not like traditional networks, in the cloud, a high-level of autonomic resource management is applied through a suite of virtualization software to handle dynamic demands fluctuations and changes in the network state. Fig. 1. Server’s Resource Virtualization [11] iJIM ‒ Vol. 12, No. 1, 2018 39 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Why existing proposals for traditional DCNs do not work for cloud service DCNs? As mentioned above, there are many differences between both traditional and cloud DCNs. Differences are mainly related to management and architecture. Hence, it is expected that existing proposals for better traditional DCNs’ designs and man- agement may not work for the cloud DCNs, why? In fact there are many reasons: • First, big portion of the costs in the traditional DCNs goes to the operational IT staff expenses, where due to partial automation [15], a ratio of 1 : 100 can describe the number of IT staff members to the number of server machines. This indicates more expenses to be paid to the IT staff. Moreover, this can lead to higher rates of human-based errors that cause large impact on the performance. In the case of cloud DCNs, the scenario is different, where due to mandatory automation re- quirements [16], the ratio of IT staff members to the network servers goes down to 1 : 1000. Consequently, the way the operational expenses are distributed in the two types of networks is different, and so, proposals for cost reduction in the traditional networks may not work well for the cloud ones. • Second, cloud DCNs are usually built to support large size networks that intercon- nect hundreds of thousands of servers. Such environments necessitate the use of economic commodity servers, not like that in the traditional DCNs that contain less number of servers which can only be covered, comparatively, with non-commodity ones. • Third, optimizing a traditional DCNs is usually represented by using less physical spaces, and less number of machines. Hence, optimization structures may follow the scale-up forms (i.e. north to south). This is usually expensive and requires re- placing the commodity machines by other high-level expensive ones. The scenario in cloud DCNs is completely different, where the space is not a priority, and the scale takes the form of scale-out (i.e. east to west) using commodity machines that allow for scalable low-price interconnection structures. Such structures that dis- tribute the workload over larger number of cheap network machines [17]. Traditional enterprise networks are moving toward the cloud, and so, proposed so- lutions for better cloud networks’ structures are expected to enhance the traditional networks as well. 2.3 Benefits of Adopting Cloud Computing Benefits of cloud services are many, and the adoption of such technology is driven by many factors. In DCNs, in addition to the massively scalable networks provided by the cloud service providers, such networks allow for delivering different applications with reliable, economic driven, and changing traffic patterns. Moreover, intelligent cloud networks can enable their providers to offer new applications and services that open the way for new markets. In general, the following are of the main benefits that a cloud DCN can provide to the end-users. Economic Drivers: Starting by economics, the shrinking IT budgets along with the increasing demand for dynamic IT services, represents one of the leading drivers for adopting such cloud technologies. Building and maintaining a facility with thou- 40 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… sands or more servers is a quit expensive issue, isn’t it? It is, certainly! Knowing that the biggest portion of DCN costs comes from the installed servers. According to [18], and as shown in Table 1, servers costs is dominant with around 45% of a total cost of a DCN, compared to only 15% to the other network equipments (i.e. switches, links, transits, ..., etc). So, why would anyone panic with building such networks that absorb high costs and time-consuming efforts for development and configuration, when in- stead, we can run our applications now and services on someone else’s machines or network? Do we need to pay for a full DCN when we can pay only for the exact amount of resources we use? Table 1. How Cost is Distributed in a DCN [18] Average Cost Component Sub-Components 45% Servers CPU, Memory, Storage 25% Infrastructure Power Outlets, Cooling 15% Power Utilities Electrical Costs 15% Networking Equipments Switching Machines, Links, Transits Scalability: From the market-perspective, the ability to support a rapid growth of dynamic service demands, without compromising the network’s efficiency and the network cost is a critical issue. Converting both infrastructure and operational costs into a scalable expenses that reflect the actual use of the resources is a promising option for many operators, especially those interested in getting more while spending less in their infrastructures [19]. Moreover, the logically infinite on-demand capacities of the cloud DCNs represents another attractive feature that provides fast support for any immediate-demanding applications to be deployed easily, without complex man- agement and time consuming operations. Resource Utilization: It also provides for efficient resource utilization based on the real-time dynamic demands of the provisioned applications. In cloud DCNs, oper- ators have the ability to meet their changing demands that varies between low to peak load states. This delivers better use of the available resources, reduce blocking rates, and cost of the provided services [18]. Ease of Maintenance: Another attractive feature of these cloud DCNs is the ease of maintenance. As an advantage of virtualization, cloud network architectures are built form less physical machines compared to the ordinary computing environments. Intuitively, a network with less hardware devices requires less efforts for maintenance and management. This not only reduce the time of maintenance, but also the number of IT technicians needed to handle the integrity of the network. It is worth to mention that such a cloud scheme allows for a win-win scenario, in which, the cloud providers can set up their networks to run the required applications in a cheaper, easier to man- age, time-saving, and reliable manner. This lies in the interest of the end-users. Same way, these cloud providers are making money from such type of business! iJIM ‒ Vol. 12, No. 1, 2018 41 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… 2.4 Risks for Adopting the Cloud DCNs Although cloud service DCNs are proposed to benefit the traditional enterprise us- ers, there still few points of stress that may negatively impact the performance and the compliance of the service level agreements in such type of networks. For cloud ser- vice providers, to provide efficient services to the DCNs’ end-users, features like performance, availability, and security are all of top importance. Virtualized networks are proposed to enhance performance and availability, how- ever, a non-efficient virtualization scheme of the servers’ physical resources may increase that servers latency, and decrease the its reliability [20]. This can badly affect the applications’ performance, and the systems’ availability. Hence, the way in which the servers’ resources are virtually partitioned among the provided applications is crucial, and therefore, such a process must be done with a high level of concern. Security or the applications integrity is also another point of challenge in cloud service DCNs. The idea in cloud DCNs is to integrate varying applications, and their related computing environments in single servers. However, and not to mention the increase in heat and power consumption, such integration or consolidation within the same physical machine magnifies the problem of single point of failure. This deepens the security threats for such points, affecting its reliability, availability, and perfor- mance [9]. However, let us be more optimistic, relying on the fact that, by days, the offered security protections provided by the service providers are only getting better. This gives us the hope that opportunities of better management technologies are on the way, and next generation DCNs are promising to provide more robust deployments. 3 Interconnection Structures for Data Center Networks The growth of proficiency in building clusters of commodity PCs has enabled the theme of integrating the provisioning process of both, computation storage and com- putation power in a cost-efficient scheme. In large institutions like universities, clus- ters can consist of thousands of nodes. Building a communication structure for such high-scale clusters can follow one of two options. First, to use high-level hardware components with specialized protocols like that in Myrinet [21]. Such an option can deliver high-scale clusters that can interconnect thousands of nodes, while providing high bandwidth capacities between the connected entities. However, these high-level (non-commodity) hardware components impose expensive funding, and usually re- quire special configurations to be compatible with the TCP/IP applications [22]. The second option is to use cheaper hardware components (i.e. commodity Ethernet switches) to handle the interconnections among the cluster nodes. This allows deploy- ing familiar management infrastructures, without any modifications in the network applications and operating systems. One major problem in building high-scale clusters is the poor aggregate bandwidth capacities in the network [22], where such bandwidth capacities do not scale well with the cluster size, and unfortunately, achieving better capacities comes with a non- linear increase in cost that depends on the cluster size. More precisely, bandwidth 42 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… capacities in large size clusters may become oversubscribed by certain percentages due to the network hierarchy and the different physical specifications of the network components. Even when employing high-level hardware components, statistics show that resulting topologies could only support 50% of the network-edge aggregate bandwidth capacities. Accordingly, the option of building such communication structures using commod- ity hardware structures can be considered the dominant. Accordingly, network archi- tects are working to design efficient interconnection structures that deliver high per- formance networks, along with low-cost infrastructures compared to that of today’s high-end solutions. In this section, we survey the state of the art in regard to the inter- connection structures in DCNs. Our focus will mainly target those structures that employ commodity designs, for which we will discuss how it works, show their topologies, and present their switch- ing techniques. Moreover, we will also provide a comparison in terms of the support- ed bandwidth capacities, and its associated cost metrics. 3.1 Background Traditional interconnection structures usually come in the form of a hierarchal tree that consists of routing and switching elements. The specifications and characteristics of such elements may vary, and hence, both performance and cost of the whole DCN may also get affected. As an example, one structure may employ the commodity GigE switches, while it is 10 GigE switches in other structures [24]. Efficient DCN structures need to satisfy the following design goals: • Scalability; DCN structures need to be designed in a way that easily allows for network expansion and dynamic changes [25]. This involves the ability to smooth- ly accommodate future upgrades of servers and any other networking equipments. • Reliability; any proposed DCNs structures must be reliable enough, with high de- grees of tolerance against network failures. • Cost-efficiency; DCNs in general need to provide for cost reduction in terms of both, network assets and power requirements. • Resource capacities; to avoid blocking and bottleneck states, DCNs should have the ability to provide for high aggregate capacities. In the tree-based structures, a hierarchy of network switches is used to interconnect the hosted servers. The current DCN practice is to use the switch-based tree structure to interconnect the increasing number of servers. At the lowest level of the tree, serv- ers are placed in racks and connected to an edge level rack switch (usually called Top of Rack switch). At the higher levels, ToR switches are interconnected using higher layer switches with capacities to aggregate the traffic of hundreds of ToRs. In this context, it is worth to mention that in such scenarios, those root switches may repre- sent bandwidth bottlenecks that may be central points of failure. iJIM ‒ Vol. 12, No. 1, 2018 43 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… 3.2 The Fat-Tree Structure Driven by the price difference between the commodity and non-commodity net- work switches, network architects started to have the tendency build their large scale DCNs using many commodity switches rather than fewer non-commodity expensive ones. With such price incentive, and to deliver high rates of bandwidth, in 1950s Charles Clos chose to build a multi-stage telephone switching network from intercon- nected commodity switches [26]. This interconnection scheme is known today as Clos network. As an instance of the Clos network, the fat-tree topology [22], [27] was proposed in the form of a multi-rooted tree that employs commodity Ethernet switches to inter- connect the DCNs’ servers. In fat-tree, redundant aggregation points are used to re- duce the problem of bandwidth bottlenecks and central points of failure. Topology: In the fat-tree, the topology is organized as depicted in Figure 2, where there is a tree-based hierarchy that consists of a set of layered network switches that are used to interconnect a group of network servers. Each set of servers is placed in a rack, and each rack has a edge switch that interconnects all of the underlying servers together, and to the rest of the network. Edge and aggregate switches are grouped in pods, where a for fat-tree structure that has k pods, there are k switches in each pod (k/2 edge and k/2 aggregate switch- es). Edge switches come with k ports, k/2 of these ports are used to create direct con- nections with k/2 servers, and the remaining k/2 ports are used to get connected with k/2 other aggregate switches. In the most higher level of the network structure, !!!!!! core switches are employed to interconnect the whole underlying aggregate switches, each core switch comes with k ports that interconnects the underlying k pods. Based on these specifications, a designer can define the number of physical hosts (i.e. servers) a DCN can support based on the switch degree. True, where a fat-tree DCN that is built from k port switches can be used to physically interconnect !!!! servers [22]. Fig. 2. A Fat-Tree interconnection Structure, with n = 4 [22] 44 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Features and Summary: Switches used in the fat-tree topology are all identical, which means that all network levels of a fat-tree DCN come with the same specifica- tions of the switching components[26]. This allows cheap commodity switches, and so low-cost DCNs. Hence, it is worth to note that the phrase hierarchy in fat-tree re- fers to the structural level only, not the types of those used equipments. In terms of bandwidth capacities, fat-tree is designed with the intention to support full bisectional bandwidth between the network servers by the use of multi-rooted trees [28]. This is assumed to deliver non-blocking communication sessions between the interconnected servers. Multi-rooted trees are built with multi core switches. Those that interconnect the huge number of aggregate/edge switches that gathers the tree branches (i.e. servers). This means additional costs. What is more, in regard of scaling capacities, fat-tree structures come with limited number of the ports available physically at their switch- es. 3.3 The DCell Structure Motivated by the goals of providing scalable interconnection structures, high bandwidth availability among the interconnected hosts, and avoiding single points of failure, the DCell is proposed in [29] as a recursively-defined structure to interconnect the DCNs’ servers. Not like the fat-tree, in DCell, the interconnection between the network entities is mainly built through the servers. More precisely, high-level DCell are built from many low-level ones, where each server is connected to different pods of DCells via multiple links, in a way that the low-level DCells form a fully-connected graph, see Figure 3. In terms of structure, DCell employs commodity low-level switches to scale-out, instead of the scale-up approaches that requires expensive high-end switches. As a server-centric structure [30], it provides for a double exponential scale with respect to the employed servers’ node degree. In practice, a DCell with a small server degree (e.g. say: 4), can support interconnection to as many as several servers without the need of those expensive high-end switches. Being a structure with no single aggregation points, DCell can be considered fault tolerant with no central points of failure. Moreover, fault tolerance also comes from the rich physical connectivity a DCell has. However, it be considered as a structure that requires high costs for wiring among the servers, since it uses more and longer communication links compared to that in the tree-based models. Topology: The DCell structural topology is organized as depicted in Figure 3 which comes as a level-based structure. As shown in the figure, n servers are inter- connected via n!port switches to build a DCellk. In such recursively-defined topolo- gy, a DCellk+1 is built from n + 1 units of DCellk. Accordingly, if nk servers are required to build a DCellk, and nk + 1 units of DCellk are required to build a DCellk+1, then we can generally say that the number of servers nk+1 in a DCellk+1 is given by !"! + nk. iJIM ‒ Vol. 12, No. 1, 2018 45 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Fig. 3. A DCell Interconnection Structure with n = 4 [29] Switching: In DCell, switching is done with the goal of connecting huge number of servers in a way that accommodate for dynamic traffic changes. Accordingly, in DCell interconnection structures, the use of the global link-state routing schemes is not a recommended option as they create huge control overhead in the network [29]. To avoid points of failure and bandwidth bottlenecks, the Open Shortest Path First (OSPF) routing protocol is not a routing option too [31], as it imposes huge traffic overhead. Therefore, the authors of [29] proposed a fault-tolerant routing protocol that is claimed to provide a decentralized near-optimal routing scheme. Such fault- tolerant protocol claims to effectively handle various failures that may vary between hardware, software, and power issues. 3.4 The BCube Structure Designed for shipping containers and modular datacenter networks, the BCube structure is proposed in [32] as a server-centric network interconnection structure that employs multiple commodity switches in a hierarchal-style to interconnect large numbers of multi-port servers. In BCube, as shown in Figure 4, commodity four-port switches are employed to create multiple parallel short paths between pairs of servers in a structure that inter- connects sixteen different servers. This not only provides for high one-to-one band- width, but also improves fault tolerance and load balancing. BCube accelerates one- to-many and one-to-all traffic. Moreover, due to its low diameter, BCube provides high network capacity for all-to-all traffic [32]. 46 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Fig. 4. A BCube Interconnection Structure with n = 4 [31] Compared to the fat-tree and DCell structures, the authors of [32] claimed that BCube is better than these two structures as it does not have performance bottlenecks, and provides for larger network capacities with direct one-to-x support without mini- mal upgrade requirements. On the contrary, such direct connection between the net- work switches and servers imposes high wiring expenses. Topology: As mentioned before, in BCube, the structure is mainly composed of commodity low-level switches with limited number of ports, and multi-port servers that are interconnected to the switches of the upper layers. Being a recursively- defined structure, higher-level BCubes are built form lower level ones. Given a BCubek that interconnects n servers through n !port switches, BCubek+1 can be built from n units of BCubek, using n n!port switches. BCubek can interconnect !!!! serv- ers, each comes with k + 1 ports connected to k + 1 levels of switches, each level consists of !! n !port switches. It is worth to note that in the BCube topology, switches are only connected to servers and not to other switches. Consequently, switches in BCube are considered as dummy crossbars that provide only the intercon- nection among the underlying servers. Switching: In BCube, the DCN servers provide multiple ports. Servers are inter- connected to multiple layers of commodity switches providing multiple short paths between the interconnected servers. Such richness of parallel paths can provide higher aggregate bandwidth capacities, along with improved fault-tolerance. Taking the advantage of this multi-path property, BCube runs a source routing protocol that is installed over the network servers to balance the traffic and handles the link failures. In the case of server or switch failures, such protocol allows for graceful degradation in the bandwidth capacities of the network. The authors of [32] proposed a new BCube routing protocol suite. In their work they claimed to provide a fast packet forwarding protocol that can decide the next hop of any received packet through one table look-up process. This proposed protocol can be implemented in both software and hardware. 3.5 The FiConn Structure FiConn is also a server-centric structure, however, its new contribution comes in the utilization of the servers’ Ethernet backup ports. More precisely, the idea came form the observation that the commodity server machines used in today’s DCNs usu- ally have two built-in Ethernet ports, one for connection with the switch and other for iJIM ‒ Vol. 12, No. 1, 2018 47 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Fig. 5. A FiConn Interconnection Structure with n = 4 [28] backup reasons. Accordingly, the authors of [28] proposed that activating the servers’ backup ports for network connections can represent an opportunity for building lower cost interconnection structures. Indeed, having more ports at the servers level can provide direct interconnection sessions between the different servers without the need to go through switching machines. In this way, low-level commodity switches can handle the aggregation issues to form a scalable effective structures. As depicted in Figure 5, a high-level FiConn is constructed by many low-level Fi- Conns. When constructing a higher-level FiConn, the lower-level FiConns use half of their available backup ports to get interconnected to other servers, and form a mesh. In this way, the number of provided servers grows rapidly with the FiConn level. Not like the Fat-Tree, where the scale is limited by the number ports at the switches, and neither like the DCell that requires higher number of server ports to scale. However, FiConn works only with servers that have a node-degree of two. Although FiConn can provide scalable low-price structures, still, it adds higher control overhead if compared with those of tree-based structures. Moreover, employing low-cost com- modity switches can reduce the network cost, but on the other hand, such server- centric structure adds more wiring costs besides the higher CPU overhead for re- source forwarding at the servers side. In terms of bisectional aggregate bandwidth capacities, Fat-Tree and DCell proves to provide better capacities. Hence, we can easily recognize that lower switching costs in such a structure comes with a tradeoff with the provided bandwidth capacities. Topology: Mainly, as shown in Figure 5, the FiConn topology consists of multiple servers of node-degree two, with one level of commodity low-price switches. In a recursively defined structure, high level FiConns are constructed form low level ones. Compared to the Fat-Tree, if we assume having N servers, then the number of n!port 48 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… switches needed to interconnect the FiConn structure is given by N/n, while it is 5N/n for the Fat-Tree. FiConn0 is the basic FiConn level, and it is composed of n servers and n!port switches. Usually the number of servers in FiConn structures is even. FiConnk is build from a set of FiConnk!1 entities that are interconnected together using their servers’ backup ports. Hence, to build a FiConnk structure, you need a set of FiConnk!1 entities interconnected through their backup ports. If you denote the number of servers that have backup ports in a FiConnk!1 structure by S, then you need gk structures of Fi- Connk!1 to build a FiConnk. This gk is given by: gk = S/2 + 1 Where only S/2 servers of each FiConnk!1 use their backup ports to interconnect to another S/2 structures of FiConnk!1 through their backup ports as well. These selected S/2 servers are call level !k servers. Switching: In FiConn, servers are configured with two-ports, where these servers are connected to commodity low-level switches. The servers are configured to use half of their available back-up ports for interconnection with other servers in other FiConns to form a kind of mesh. Routing in FiConn structures is claimed to balance the usage over the different network links, and at the same time improve the resource utilization according to the dynamic traffic changes. Deploying the traffic-oblivious routing scheme in FiConn shows good performance in balancing the traffic loads over different levels of the network links, but on the contrary, such scheme has the follow- ing limitations: (1) For a pair of communicating servers, it is only allowed to use two of the available backup ports. Using more ports is not allowed, even if this was moti- vated by improving the resulting end-to-end throughput. (2) Due to such rigid set- tings, it cannot dynamically cope with the real-time changes in the traffic demands in the network. Therefore, to overcome these limitation, the authors of [28] proposed the traffic- aware routing scheme. Briefly, this traffic-aware scheme does not rely on doing the traffic scheduling of the network on central server entities, but instead, it distributes that over the whole network servers. Accordingly, each server will be responsible for balancing its outgoing traffic over its outgoing ports, where ports with the higher bandwidth availability are always selected to hold the new outgoing traffic. 3.6 Summary This section of the survey discussed four main structures proposed in the literature for the DCNs, i.e., Fat-Tree, DCell, BCube, and FiConn. For which, the survey re- viewed their topological, cost, and switching characteristics. Through such presenta- tion, we observed that in DCNs, the objectives of scalability, fault tolerance, and bandwidth capacities get higher priorities than other metrics. In this regard, Table 2 provides an analysis that summarizes the behavior of the aforementioned four struc- tures. In this context, it is worth to note that although those server-centric proposals provide for iJIM ‒ Vol. 12, No. 1, 2018 49 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… scalable structures, still they impose huge cpu overhead at the servers’ side. This can be considered as a point of limitation, since if we refer to Table 1, we can clearly notice that when it comes to cost, servers take the lead of all other assets a DCN re- quires. Table 2. Structural Proposals for Data Center Networks (DCNs) Proposal Structure Switches Cost Cables Cost Routing Overhead Scalability Bandwidth Fault- Tolerance Fat-Tree Switch-centric many com- modity units low on switches limited by switch degree best High Tolerance DCell Server- centric few commod- ity units high on servers limited by server degree Less than BCube Average Toler- ance BCube Server-centric more com- modity units high on servers Less than DCell high High Tolerance FiConn Server- centric few commod- ity units average on servers limited by backup ports Less than DCell Below-Average Tolerance 4 Routing and Traffic Engineering in Data Center Networks As shown in Section 3, different interconnection structures are proposed in the lit- erature for the DCNs. Challenges are many, and so do the objectives to be achieved from these different structures. Such challenges and objectives vary between cost, reliability, scalability, and bandwidth capacities. An important player in the efficiency of such interconnection structures is the routing protocols. These protocols that help in exploring the bandwidth capacities that would be available between the intercon- nected machines in a DCN. Providing significant bisectional bandwidth capacities is a fundamental aspect for DCNs. Accordingly, intensive efforts and research works are spent to deliver efficient interconnection structures that allow for scalable, and non- blocking topologies. General speaking, DCNs interconnection structures can be categorized into a set of two main schemes, a server-centric and a switch-centric. Each scheme has certain characteristics that distinguish it from the other. Different from that proposed for the Internet, researchers have developed a set of routing protocol schemes that are spe- cially designed to suite the DCNs’ topologies. 4.1 Routing Schemes To review the routing schemes of the proposed interconnection structures like Fat- Tree, DCell, BCube, FiConn and others, in this section, we are categorizing them as follows: Server-Centric Schemes: Recognized from the name, in server-centric schemes, the interconnection responsibilities in a DCN are mainly placed onto servers. Conse- quently, the servers play a double role, where in addition of being end-hosts, they are relay nodes for other communication paths in the network. FiConn [28], DCell [29], and BCube [32] are all structures that fall into this category. 50 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Switch-Centric Schemes: Not like server-centric schemes, switches are the only relay nodes in the switch-centric DCNs. In such a scheme, all interconnection session among the network hosts pass through the upper-layer switches (i.e. edge, aggrega- tion, and core). In general, such interconnection structures follow a special instance of the Clos topology [26] (proposed for telephone networks in 1950s), which named the Fat-Tree. In this Fat-Tree structure, commodity Ethernet switches are generally used to interconnect massive number of hosts. In such DCNs, the proposed routing schemes usually follow the topological hierarchy. The structure of Portland [7] and VL2 [33] are also examples for other models that fit in this category. 4.2 Data Forwarding Techniques A huge portion of the Internet communications and their related computing and storage processes is migrating toward the DCNs. To accommodate this, DCNs must be highly engineered to support scalable and fault-tolerant data center networks. Cur- rent routing and data forwarding protocols that are deployed in the DCNs are original- ly developed for Local Area Networks (LANs). However, such protocols do not show good performance when deployed for networks that interconnect large number of hosts like that of a medium size DCN. Assume a DCN that hosts 100,000 servers, and virtually, each servers runs 20 Virtual Machines (VMs). This comes to approximately 2,000,000 IP and MAC addresses. Not to mention the number of required switches, a network with such size imposes a huge management overhead at the provider’s side. For DCNs, an efficient routing and forwarding protocol should support for scalable and fault-tolerant environment. Such environments that consider [6]: • Easy migration of any VM to any physical server in the network. This should in- volve keeping the original IP addresses to avoid breaking old TCP connections and any other application-level sessions. • Self-learning switches. • Fault-tolerance scaling. • No forwarding loops. • Hosts in the DCN can communicate with each other efficiently over any available path in the network. Existing layer 2 and layer 3 network protocols face some challenges in satisfying such requirements. However, to some extent, achieving the first two points requires deploying a layer 2 fabric through the entire DCN. In a layer 3 fabric, this requires high management overhead for configuring the network switches, each individually, with their sub-network information to distribute the appropriate IP addresses among the network hosts, after being synchronized with DHCP servers. Further, this makes the issue of VMs migration more complicated, since by migrating to another sub- network, VMs should switch their IP addresses to meet the addressing scheme of the new physical location. Concerning the scaling requirements, layer 2 fabrics do not represent an optimal option. Indeed, broadcasting at layer 2 is a challenging issue. Satisfying such scale targets requires special protocols that can quickly propagate the urgent topology up- iJIM ‒ Vol. 12, No. 1, 2018 51 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… dates to their points of interest. Unfortunately, current routing protocols (e.g. OSPF) are broadcast ones. This imposes a kind of configuration overhead, which contradicts with the second point. Regarding the forwarding loops problem, neither layer 2 nor layer 3 protocols can avoid it, since such loops can possibly happen during routing convergence especially in topologies of DCNs which provide for redundant path between the different couples of source-destination servers. Though, it is less of an issue in layer 3 as the Time To Live (TTL) counter limits the packets resource consumption while updating the forwarding tables. For the last point and the agility issue, it still seems impractical to build layer 2 forwarding tables with millions of entries. Therefore, really scalable/agile network fabric for DCNs is still not yet achieved, where as presented above there is always a tradeoff between the available network protocols in terms of flexibility, scalability, reliability and performance. Addressing in layer 3 is done by assigning IP addresses to the network hosts, all in a hierarchical way following the host’s directly connected switch. However, layer 3 forwarding has the following limitations [7]: • Network topology updates (e.g. adding a new switch) can be considered as risky processes, where manual configuration by the network provider is required. • Improper synchronization and error configurations (e.g. DHCP servers, sub- networks identifiers) among the network components can lead to non-reachable network segments. • Poor support for scalability and servers’ virtualization. Consequently, to reduce the administrative overhead and avoid any risky configu- rations, some networks deploy layer 2 forwarding that is performed based on MAC addresses. But still, layer 2 fabrics have the following limitations: • Relying on the Ethernet bridging techniques limits the scalability properties of the network. Assume a DCN with 100,000 hosts, how to support broadcast through the entire network? • In topologies that have multiple equal cost paths, how to enhance the performance while relying on a single forwarding tree? Some propositions suggest a hybrid ground that integrates the positive characteris- tics of layers 2 and 3, while reducing the problem of broadcasting in layer 2 and providing higher level of scalability. Employing the technology of Virtual LANs (VLANs) in layer 2 fabrics facilitate crossing multiple switch boundaries. However, the VLAN technology itself has some problems like: • Splitting the network to virtual separate domains requires bandwidth resource res- ervation for each VLAN at each switch. Such allocations, if not dynamically allo- cated, provide less flexibility and low bandwidth utilization rates. • With such broadcast domains, switches must keep a state for each host they con- nect. This limits the scalability and network agility. • In VLANs, the use of a single forwarding tables requires large update messages between the network switches which affects the network performance. 52 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… 4.3 Traffic Engineering in Data Center Networks Internet routing schemes usually look for routes that connect two network nodes under certain latency constraints, however, in DCNs the scenario is somehow differ- ent, where other sophisticated requirements are taken into consideration like high reliability, consumed energy, and specific performance metrics [6]. Satisfying such constraints requires special Traffic Engineering (TE) efforts. In TE, network providers are adapting the routing decisions of the network traffic according to the network conditions. This can help in optimizing the network performance in accord- ance to the dynamic traffic status and the behavior of the transmitted data patterns. In DCNs, traffic is divided into two main parts, inter-DCN and intra-DCN traffic. What concerns more is the intra-DCN one, since the performance of a DCN mainly depends on its internal communication patterns [34]. Inter-DCN traffic are routed via the well- known Border Gateway Protocol (BGP) as any other external traffic in the Internet. A challenging problem for TE in DCNs is how to expect the traffic patterns. For different applications, such patterns may vary significantly and in some cases the traffic traces can have a kind of confidentiality. Moreover, DCNs are growing rapidly, and the need for scaling is evolving, which adds more complexity and challenge in how to efficiently control and manage such expands and variety of applications. Design Principles: When proposing a DCN TE model, the following principles should be taken into account: • Reliability: The first goal when proposing a TE model should be optimizing the routing scheme to provide a reliable and fault-tolerant data forwarding patterns. Mostly, such DCNs carry important information that provides crucial services and important application for different business operations, and end-users. Thus, a suc- cessful DCN is that which provides reliable and robust services to its users. Conse- quently, reliability is considered as a point of concern for both DCN service pro- viders and their subscribers. • Resource Utilization: Reliability and fault-tolerance highly depend on how the network resources are utilized. Better bandwidth utilization allows for higher throughput, lower blocking, and less latency. Moreover, it greatly affect the both capital and operational expenses of the network. Hence, an efficient TE model should adapt the routing schemes to utilize the network bandwidth capacities in or- der to serve varying applications, each with different traffic pattern, while provid- ing Quality of Service (QoS) and performance guarantees. • Power Expenses: To provide efficient services with competing prices, DCNs pro- viders should try to minimize the network expenses the most possible. Operating network machines consume energy, in this context, efficient TE models should di- rect the routing schemes to use the least possible number of links and switches. This can reduce the energy expenses, and consequently maximize the DCNs profits while allowing them to offer their services with market-competing prices. How TE in Data Center Networks differ from that in the Internet? Designs of the DCNs are different from that of the Internet [6], where some features in DCNs iJIM ‒ Vol. 12, No. 1, 2018 53 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… requires new design directions. Accordingly, when designing a TE model for a DCN, the following points should be considered: • Node Location: Traditional TE problems usually deal with fixed source-destination locations, and the traffic is distributed over the Internet links. In DCNs, the scenar- io is different, where a VM that runs service x can dynamically change its location for better performance and agility issues. This can allow for adopting better more efficient routing schemes. • Topology: Interconnection structures in DCNs are mostly symmetric, having mul- tiple paths between the interconnected network servers. TE engineering models that utilize such redundant paths for performance utilization require special routing schemes different from that in the Internet. • Centralized TE: Not like the Internet, DCNs represent a convenient network style in which centralized TE and management schemes can be efficiently deployed. Such schemes where a centralized network operator entity can control and collect performance metrics of the whole underlying network components. Although this may impose higher control overhead, but provides for simplified implementation. • Infrastructure: Driven by the cost-efficiency requirements, DCNs are usually built from commodity layer 2/3 switches with higher link densities. So, compared to the Internet, DCNs nodes are not expected to be as reliable as the high-level routers with more open cost availabilities. • Multi-rooted Designs: To provide full bisectional bandwidth capacities in com- modity DCNs interconnection designs, multi-rooted tree topologies are a necessity, where aggregating bandwidth capacities over such multi-rooted paths may deliver the desired capacities among the network hosts. In the Internet topologies, such re- dundant paths are not allowed as it creates the undesired forwarding loops. Moreover, compared to the Internet, routing in DCNs has the following unique characteristics: • Common Topologies: With the reason of increasing the network performance and scalability, DCNs designs are mostly employing very similar routing protocols. • Short packet life: Statistics show that most of the traffic patterns in DCNs are of short-life ones, hence this adds some challenges in expecting the dynamic traffic patterns and employ the proper TE design. • Agility: In DCNs, agility is necessary for load-balancing and availability concerns. In regard to the Internet traffic Internet. 5 Directions for Open Issues and Future Research The works surveyed throughout this papers shed the light on the issues of structural and routing challenges in the areas of cloud DCNs. Proposals of network structures are many, however, only those who provide for efficient services and high perfor- mance metrics prevail. Providers of cloud DCNs will always seek for those proposals that allow for easy scale and agile topologies. Such topologies that 54 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… do not require frequent structural updates, while providing for sufficient service levels at competing prices. In this context, and beside the operational behavior of any proposed structure, the providers of the cloud DCNs must tackle the challenge of efficient service provision and service-leasing issues. Indeed, the theme of cloud DCNs is mainly the provision of computing resources in the form of services. Such services vary between software, platform, and infrastructure. Efficient service provi- sion in such environments comes through the following: (1) efficient resource utiliza- tion, (2) efficient service allocation schemes. Therefore, we consider the aforemen- tioned provision challenge points as hot research topics and open fields for deep tack- le. Briefly, we will discuss them in the following: 5.1 Efficient Resource Allocation Almost all Cloud DCN structures proposed in the literature provide for high re- source capacities, such as those of switching and links bandwidth capacities. Richness in resource capacities is a fundamental aspect for such sort of networks, indeed, as the theme of cloud DCNs is to provide the networks end-user (i.e. service tenants) with their required levels of services with reasonable price units. This type of business necessitates certain levels of guarantees that provide the end-users with the necessary satisfaction rates in regard to the resource availability and service price units. To pro- vide the aforementioned guarantees, cloud DCN resources need to be smartly utilized. In this context, various research works are proposed in the literature. Among the re- viewed proposals, many proposed approaches modeled the problems of resource allo- cation/reservation as auctions where the cloud DCN resources are leased to certain bidders who satisfy predefined conditions. In [35], the network’s bandwidth reserva- tion process is modeled using a Vickrey-Clarke-Grove (VCG) auction [36], a mecha- nism that is inherited from the Game-Theory, through which the cloud provider as- signs bandwidth reservations among the cloud service tenants based on: (1) their offered bids, and (2) the affect of their presence in the network on a social welfare value that is calculated by the system. Instead of the VCG, the authors of [37] proposed using the Shapley value [38] for price-unit calculations. So, according to their work, the price of the amount of bandwidth resources allocated to a service tenants is calculated by the cloud provider, in accordance to the average marginal charge for the allocated resources. In [39], the work proposes a model that tackles the problem of both bandwidth reservation and allocation through a two-tier approach. In which, the cloud provider runs an auction for bandwidth reservation first, and then after the reservation round ends, remaining bandwidth resources are auctioned to be allocated to the service tenants. Price calculation in [39] varies between reservation and allocation processes. In the reservation auction, the price unit is initially set to a premium price to encourage high bid offers [40]. In the allocation phase, the model considers a market clearing price for allocation. This market price is defined accord- ing to the lowest accepted bid received in the auction. Accepted here refers to those bids who have sufficient bandwidth resources at the provider’s side to sat- isfy their requests, regardless of their offered bid price. This assumes fair pricing for all bidders [41]. The aforementioned works considers either the provider’s interest or iJIM ‒ Vol. 12, No. 1, 2018 55 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… the tenant, but not both. In [41] and [42], the authors proposed a model that considers the interest of both, i.e. the provider and the tenant. In their work, the authors pro- posed a resource allocation model based on a bargaining game, through which they studied the resource allocation problem for Virtual Machines (VMs) over a set of physical DCN servers. The allocation model presented in this work is formulated with the objective of maximizing both utilities, i.e. of the providers and tenants together. 5.2 The Challenge of DCN Migration The previous section discussed approaches may provide for optimal alloca- tion/reservation decisions for a service instance or a VM, however, what about an optimal approach for full VDCN allocation over a physical DCN, does this seems feasible? Among the service-themes provided by the cloud DCN is the infrastructure as a service, in which, a user can lease a whole Virtual DCN (VDCN) infrastructure from a cloud physical DCN. For load and scale necessities, a VDCN provider can choose to migrate from its current place (i.e. the physical DCN that currently hosts the VDCN) to other new place (i.e. new physical DCN) that provide for larger re- source/scale capacities to suite the dynamic VDCN load requirements. The research in such a problem can be considered as a novel open direction that is still in its early stages, motivated researchers may choose such hot-issue to be tackled by their future research works. 6 Conclusions Computation is moving into the cloud, and thus into DCNs. Within the DCN, pro- posed interconnection structures must be aware of the end-to-end system require- ments before providing their suggested solutions. Hence, structures should provide for agility, reliability, cost-efficiency, and high resource utilization. In cloud data centers, automation is a necessity for scale, and it is accordingly considered as a fundamental principle of design. The soul of DCN lies in the theme of Virtualization, which represents a promising aspect for higher performance and maximum reliabil- ity, and it can be deployed in both server and storage equipments. Moreover, employ- ing the concept of consolidation beside such virtualization technologies in DCNs can enable the IT organizations to turn computing and storage resources from monolithic systems into a shared pool of resources. Such pools that consists of standardized components which can be dynamically aggregated, tiered, provisioned, and accessed through an intelligent network. However, virtualizing such networks has some constraints that vary between real time replication and whether the considered appli- cation can be clustered or not. But still, the journey to fully virtualized and autonomic DCNs is still in its early stages. Though, we should admit the issue that we no longer design for individual or single server applications, the work now evolves toward the cloud and huge clustered network applications. Finally, we can say that a properly planned DCN is that who protects the application and data integrity, optimizes their 56 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… availability and performance, and allows for scale and change according to the market requirements and business priorities. 7 References [1] L. A. Barroso and U. Hlzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. A Publication in the Morgan and Claypool Publishers series, 2009. [2] W. Ni, C. Huang, and J. Wu. Provisioning high-availability datacenter networks for full bandwidth communication. Computer Networks, 68, Pages 71-94, 2014 https://doi.org/10.1016/j.comnet.2013.12.006 [3] E. Giesa. Data center virtualization q & a. F5 White Paper, 2011. [4] A. Singh, M. Korupolu, and D. Mohapatra. Server-storage virtualization: Integration and load balancing in data centers. In in the Proceedings of the ACM/IEEE conference on Su- perComputing, SC’08, pages 1–12. IEEE, Nov 2008. [5] D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, and J. Wu. Scalable and cost-effective in- terconnection of data-center servers using dual server ports. IEEE/ACM TRANSACTIONS ON NETWORKING, 19(1), Feb 2011. https://doi.org/10.1109/TNET.2010.2053718 [6] K. Chen, C. Hu, X. Zhang, K. Zheng, Y. Chen, and A. V. Vasilakos. Survey on routing in data centers: Insights and future directions. IEEE Networks, 25(4), Aug 2010. [7] R. N. Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. Portland: a scalable fault-tolerant layer 2 data center network fabric. In in Proceedings of the ACM SIGCOMM ’09 conference on Data communication, pages 17–21. IEEE, Aug 2009. [8] A. Wiess. Computing in the cloudes. NetWorker Magazine: Cloud computing, PC func- tions move onto the web, 11(4), Dec 2007. [9] S. Long. Taking the enterprise data center into the cloud, achieving a flexible, high- availability cloud computing infrastructure. A White Paper from the Experts in Business- Critical Continuity, Dec 2010. [10] K. M. Sup, T. Ali, A. Leon-Garcia, and H. J. Won-Ki. Virtual network based autonomic network resource control and management system. In Proceedings of the IEEE Com ’05. IEEE, 2005. [11] G. Khanna, K. Beaty, G. Kar, and A. Kochut. Application performance management in virtualized server environments. In in the Proceedings of the Network Operation and Management Symposium, pages 373–381. IEEE, Apr 2006. https://doi.org/10.1109/ NOMS.2006.1687567 [12] M. Cardosa, M. R. Korupolu, and A. Singh. Shares and utilities based power consolidation in virtualized server environments. In Proceedings of the International Symposium of Inte- grated Network Management, pages 327–334, Long Island, NY, USA, IEEE. Jun 2009. https://doi.org/10.1109/INM.2009.5188832 [13] M. Steinder, I. Whalley, D. Carrera, I. Gaweda, and D. Chess. Server virtualization in au- tonomic management of heterogeneous workloads. In in Proceedings of the IFIP/IEEE In- ternational Symposium on Integrated Network Management, IM 07. IEEE, May 2007. [14] R.D. Couto, S. Secci, M. E. Camptisa, and L. H. Costa. Reliability and Survivability Anal- ysis of Data Center Network Topologies. In ACM Journal of Network and Systems Man- agement, 24 (2), Pages 346-392, April 2016 iJIM ‒ Vol. 12, No. 1, 2018 57 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… [15] W. Enck, T. Moyer, P. McDaniel, S. Sen, P. Sebos, S. Spoerel, A. Greenberg, S. Y.-W. Er- ic, R. Sanjay, and W. Aiello. Configuration management at massive scale: system design and experience. IEEE JSAC, Network Infrastructure Configuration, 27(3), 2009. [16] Z. Kerravala. Configuration management delivers business resilience. The Yankee Group, Nov 2002. [17] M. Isard. Autopilot: Automatic data center management. Operating Systems Review, 41(2), 2007. https://doi.org/10.1145/1243418.1243426 [18] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The cost of a cloud: research prob- lems in data center networks. ACM SIGCOMM ’09 Computer Communiction Review, 39(1), Jan 2009. [19] A. Hammadi, M. Mohammad., and T. El-Gorashi. et. al. Resource Provisioning for Cloud PON AwGR-Based Data Center Architecture. In Proceedings of the 21st European IEEE Conference on Networks and Optical Communication, Pages 178-182, Lisbon, Portugal, June 2016. https://doi.org/10.1109/NOC.2016.7507009 [20] A. Curtis, T. Carpenter, M. Elsheikh, A. López-Ortiz, and S. Keshav. REWIRE: an opti- mization-based framework for unstructured data center network design. In Proceedings of the IEEE INFOCOM, pages 1116–1124, 2012. [21] N. Boden, D. Cohen, R. Felderman, A. Kulawik, C. Seitz, and J. Seizovic. Myrinet: A gi- gabit-per-second local area network. IEEE, 15(1), 1995. [22] M. Al-Fares, A. Loukissas, and A. Vahdat. A scalable, commodity data center network ar- chitecture. In in Proceedings of the ACM SIGCOMM ’08 conference on Data communica- tion, pages 63–74. IEEE, Aug 2008. [23] C. Kachris and I. Tomkos. A survey on optical interconnects for data centers. In IEEE Commun. Surv. Tutor. 14(4), Pages 1021-1036, 2012. https://doi.org/10.1109/SURV.20 11.122111.00069 [24] A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. Towards a next generation data center architecture: Scalability and commoditization. In in the Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow, Seattle, Washington, USA, Aug 2008. [25] Q. Zhang, M. Zhani, R. Boutaba, and J. Hellerstein. Dynamic heterogeneity-aware re- source provisioning in the cloud. In IEEE Trans., Cloud Computing, 2(1), Pages 14-28, 2014. [26] C. Clos. A study of non-blocking switching networks. Bell System Technical Journal, 32(2), 1953. https://doi.org/10.1002/j.1538-7305.1953.tb01433.x [27] A. F. Mohammad, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center tcp (dctcp). In in the Proceedings of the ACM SIGCOMM 2010 conference, Seattle, Washington, USA, IEEE, Sep 2010. [28] D. Li, C. Guo, H. Wu, K. Tan, Y. Zhang, S. Lu, and J. Wu. Ficonn: Using backup port for server interconnection in data centers. In in the Proceedings of IEEE INFOComm ’09, pages 2276–2285. IEEE, Apr 2009. [29] C. Guo, H.Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. Dcell: A scalable and fault-tolerant network structure for data centers. In in the Proceedings of the ACM SIGCOMM confer- ence on Data communication. IEEE, Oct 2008. https://doi.org/10.1145/1402958.1402968 [30] A. Hammadi, T. El-Gorashi, and M. Mohammad. et. al. Server-Centric PON Data Center Architecture. In Proceedings of the 18th International IEEE Conference on Transparent Optical Betworks, Trento, Italy, July 2016. https://doi.org/10.1109/ICTON.2016.7550695 [31] J. Moy. OSPF version 2. RFC 2328, Apr 1998. [32] C. Guo, G. Lu, D. Li, H.Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu. Bcube: A high performance, server-centric network architecture for modular data centers. In in the 58 http://www.i-jim.org Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… Proceedings of the ACM SIGCOMM conference on Data communication. IEEE, Oct 2009. https://doi.org/10.1145/1592568.1592577 [33] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, and P. Pa- tel. Vl2: A scalable and flexible data center network. In in the Proceedings of the ACM SIGCOMM 2009 conference on Data communication, Barcelona, Spain, Aug 2009. IEEE. https://doi.org/10.1145/1592568.1592576 [34] N. F. et. al. Helios: A hybrid electrical/optical switch architecture for modular data centers. In in Proceedings of the IEEE SIGComm ’10. IEEE, 2010. [35] Yang, G.; Zhenzhe, Z.; Fan, W.; Xiaofeng, G.; Guihai, C. SOAR: Strategy-proof auction mechanisms for distributed cloud bandwidth reservation. Proceedings of the 2014 IEEE International Conference on Communication Systems (ICCS). Macau, China, Nov 2014; IEEE, https://doi.org/10.1109/ICCS.2014.7024786 [36] Quttoum, A.N.; Otrok, H.; Dzion, Z. ARMM: An Autonomic Resource Management Model for Virtual Private Networks. Proceedings of the 2010 IEEE International Confer- ence on Consumer Communications and Networking Conference (CCNC). Las Vegas, NV, USA; IEEE, , Jan 2010. https://doi.org/10.1109/CCNC.2010.5421818 [37] Jinwu, G.; Xiangfeng, Y.; Di, L. Uncertain Shapley value of coalitional game with applica- tion to supply chain alliance. MDPI, Journal of Sensors, 56, pp. 551-556, July2017. https://doi.org/10.1016/j.asoc.2016.06.018 [38] SHI, W.; Wu, C.; Li, Z. A Shapley-value Mechanism for Bandwidth On Demand between Datacenters. IEEE Transactions on Cloud Computing, 2015. [39] Wee, K.T.; Dinil, M.D.; Mohan, G. Uniform Price Auction for Allocation of Dynamic Cloud Bandwidth. Proceedings of the 2014 IEEE International Conference on Communi- cations (ICC), Sydney, NSW, Australia; IEEE, June 2014. [40] Baseem, W.; Nancy, S.; Ahmed, K. Modeling and pricing cloud service elasticity for geo- graphically distributed applications. Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM). Ottawa, ON, Canada; IEEE, May 2015. https://doi.org/10.1109/INM.2015.7140337 [41] Jian, G.; Fangming, L.; Haowen, T.; Yingnan, L.; Hai, J.; John C.; Lui S. Falloc: Fair net- work bandwidth allocation in IaaS datacenters via a bargaining game approach. Proceed- ings of the 21st Internation IEEE Conference on Network Protocols (ICNP). Goettingen, Germany; IEEE, Oct 2013. https://doi.org/10.1109/ICNP.2013.6733583 [42] Jian, G.; Fangming, L.; Dan, Z.; John, C.S.L.; Hai, J. A cooperative game based allocation for sharing data center networks. Proceedings of the IEEE 2013 INFOCOM Conference. Turin, Italy; IEEE, April 2013. https://doi.org/10.1109/INFCOM.2013.6567016 8 Author Ahmad Nahar Quttoum holds an Assistant Professor position at the Computer Engineering Department in the Hashemite University, Jordan. Prior to that, he worked as a Postdoctoral researcher at the LTIR lab in the Université du Québec à Montréal (UQAM), Montreal, Canada. In that, he worked on the NetVirt project; a project for Ericsson-Canada, where mainly, he was concerned with Cloud-Service Data Center Networks. In Oct 2011, he obtained a Ph.D. degree from the Department of Electrical and Computer Engineering at the University of Quebec, Montreal, Canada. His Ph.D. research topic was about Resource Management for Virtualized Networks; a project for Bell Canada. In late 2007, he obtained a M.Sc. degree in Network Systems from iJIM ‒ Vol. 12, No. 1, 2018 59 Paper—Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center… the Department of Engineering, Computing & Technology at the University of Sun- derland, United Kingdom. During his M.Sc. studies, he worked on various research topics on network security ended with a thesis in security attacks, detection and pre- vention. In early 2006, he obtained a B.Eng. degree in Electrical and Computer Engi- neering from Jordan University of Science and Technology, Irbid, Jordan. His re- search interests include cloud computing, data center networks, virtualized networks, autonomic resource management, and network security. He is also a technical review- er for different journals and specialized magazines. Article submitted 11 August 2017. Published as resubmitted by the authors 03 October 2017. 60 http://www.i-jim.org iJIM – Vol. 12, No. 1, 2018 Interconnection Structures, Management and Routing Challenges in Cloud-Service Data Center Networks: A Survey