FACTA UNIVERSITATIS Series: Electronics and Energetics Vol. 32, N o 1, March 2019, pp. 105-118 https://doi.org/10.2298/FUEE1901105K PARALLEL OVERLOADED CDMA CROSSBAR FOR NETWORK ON CHIP Ashok Kumar K, Dananjayan P Department of ECE, Pondicherry Engineering College, Puducherry, India Abstract. For high performance of Network on Chip (NoC), Code Division Multiple Access (CDMA) technique is used recently due to its fixed communication delay, reduced area utilisation and low power consumption. The CDMA system uses Walsh based spreading code which improves the bandwidth efficiency. On the contrary, it is not effective when the number of nodes present in the system increases. Overloaded CDMA (OCDMA) is presented for such large network systems. In this paper, OCDMA crossbar is modified and advanced with parallel encoding and decoding operation using orthogonal gold codes for improving the speed of crossbar thereby obtaining high performance in NoC switch. A modified crossbar consisting of extra processing elements is used to enhance the performance of NoC based System on Chip (SoC) system. This work is simulated on Xilinx tool and implemented in Vertex-6 (XC6VLX760) Field Programmable Gate Array (FPGA) device. The proposed work is implemented for four ports, eight ports and sixteen ports with deterministic X-Y routing algorithm in 3 3 NoC design with mesh topology. This NoC switch shows 9.79% improvement in delay and shows 20.76% improvement in power consumption when compared to the existing CDMA NoCs for 8 bit data packet. Key words: CDMA, Gold code, NoC, Arbiter, FIFO buffer, FPGA. 1. INTRODUCTION As the end user requirements have increased, integrated circuits have scaled down over the past few decades. According to ITRS [1], the communication issues have evolved due to the down scaling of technology. Existing communication protocols like the bus technology, shared bus and point to point technology which achieves high performance in chip multiprocessor (CMP) has inherent drawback while sharing the resources [2]. Hence these protocols do not meet the performance requirements of system on chip (SoC) [3]. Network on chip (NoC) is a scalable communication paradigm which provides high performance in CMP with the aid of parallel processor. However, when the number of processors increases, the design of NoC becomes complicated and affects the communication latency, area occupancy and power consumption. Conventional NoC Received April 24, 2018; received in revised form July 17, 2018 Corresponding author: Ashok Kumar K Department of ECE, Pondicherry Engineering College, Puducherry, India (e-mail: kashok483@gmail.com) 106 A. K. K, D. P switch has five input port, five output port (four directional and one local) and crossbar with control module (arbiter). The four bi-directional ports are connected with neighboring switches for transfer of data between the source and destination [4]. The local port is used as a processing element (PE) and is responsible for communication between the port and crossbar. This paper proposes a new method for NoC with fixed latency, reduced system cost and power consumption. Recently, CDMA is used to transfer data between the input and output in NoC switch [5]. Fig.1 depicts the structure of CDMA NoC switch. NoC switch has neither solid design nor standard protocol and hence can be designed flexibly to meet the user requirements. The proposed method is implemented for NoC switch using a CDMA crossbar with 2-D mesh topology and 3 3 NoC designed with deterministic X-Y routing algorithm. The routing of data initially searches along the X-direction of destination router and proceeds to the Y-direction. Depending on the destination availability, the distance between the source and destination switch is calculated and transferred to the neighboring switches. Since each port has FIFO buffer, store and forward packet switching [6] is used for the proposed work. The crossbar is the key module for NoC switch as it affects the switch performance and provides multiple access for the data packets. The primary multiple access technique, Time Division Multiple Access (TDMA) is simple but not efficient for CMP. In TDMA only one port sends the data packet simultaneously leaving the other ports to wait until it releases the physical link, thereby increasing the packet latency. Space Division Multiple Access (SDMA) a dedicated path is created between the ports. CDMA is another traditional multiple access technique where the spreading code enables the medium access sharing. This method provides error-free data in CMP and reduces the multiple access interference (MAI) by appropriately selecting the spreading code sequence with low cross correlation. The performance of CDMA depends on its spreading code and hence choosing the sequence is crucial. Recently, overloaded CDMA is the most suitable medium sharing technique for CMP which increases the performance of classical CDMA crossbar with more Fig. 1 NoC switch architecture with CDMA crossbar of N input and N output ports Parallel Overloaded CDMA Crossbar for Network on Chip 107 available spreading codes. Most of the CDMA systems use Walsh codes, but these codes are suitable only for NoC system with fewer processor. Walsh code generator provides sequences, out of which only sequences can be used for spreading. On the other end, the orthogonal gold codes are of much use for NoC system with more PEs. The rest of the paper is as follows, section 2 discusses the related work of CDMA interconnects. Section 3 describes the classical CDMA operation with mathematical expressions. Section 4 presents the generation of orthogonal gold codes. Section 5 presents the NoC router with parallel OCDMA encoder and decoder. Section 6 shows the implementation of OCDMA system, and finally, the conclusion is presented in section 7. 2. RELATED WORK Recently, CDMA technique is favored for crossbar of NoC switch because of its fixed latency and reduced system cost. Kim et al. [7] proposed and implemented Walsh based CDMA crossbar. This Walsh based CDMA gave suitable results for NoC switch in terms of throughput and latency. Star-mesh based NoC switch is suggested to control large systems which have seven resources connected to the local switch and each local switch is linked to the central switch. This Walsh based CDMA gave suitable results for NoC switch concerning throughput and latency. Wang et al. [8] nominated a CDMA technique for both synchronous and asynchronous system such as Globally Asynchronous Locally Synchronous (GALS) scheme. A 6-node NoC was simulated and the results were compared with PTP NoC. Kim et al. [9] advanced the source synchronous CDMA interconnect (SSCDMA-I) thereby reducing the system overhead compared to TDMA bus. Nikolic et al. [10] presented two types of bus wrappers i.e. master wrapper along with arbiter module and slave wrapper with peripheral modules for CDMA based shared bus architecture. The transaction delay has reduced by bundling the different connections as single, two and four to reduce the parallel lines. Halak et al. [11] initiated dynamic assignment of spreading codes for CDMA users and developed a novel CDMA protocol (D protocol) for dynamic assignment. Two different architectures were proposed for CDMA i.e. serial CDMA implementation, where the data chips from all users are arithmetically summed according to their bit position and in parallel CDMA implementation where the data bits are transferred parallelly in the same cycle. The serial and parallel implementation schemes are compared with traditional CDMA, mesh based NoC and TDMA bus and it is observed that the clock frequency was improved for parallel CDMA implementation. Wang et al. [12] preferred standard basis (SB) code in place of Walsh based codes. The SB method duplicates the TDMA technique as each spreading code consists only a single chip of one and the remaining chips are zeros. This method further decreases the latency and maximizes the throughput of NoC. Ahmed et al. [13] presented the overloaded CDMA crossbar interconnect to improve the performance of NoC. Two different types of overloaded CDMA interconnect (OCI) have been suggested i.e. TDMA- Overloaded CDMA Interconnect (T-OCI) and Parallel-Overloaded CDMA Interconnect (P-OCI) is compared with bus wrappers [10] and parallel implementation CDMA [11]. By combining P-OCI and T-OCI, the speed of CDMA crossbar is improved whereas the overall system gets complicated in terms of area utilization. To improve the results of OCDMA, this paper proposes an encoder and decoder operated in parallel. By advancing 108 A. K. K, D. P the existing work, this paper has provided better results for NoC with CDMA crossbar. To the best of knowledge, this paper is the first to investigate OCDMA crossbar with orthogonal gold codes. 3. CLASSICAL CDMA As CDMA provides the same bandwidth for all users, it is more popular than TDMA or SDMA. Among the various spread spectrum techniques in literature, direct sequence spread spectrum (DSSS) is the dominant method for multiple access. DSSS-CDMA is a method of multiplexing using unique high-frequency spreading codes. Two types of spreading codes used for CDMA are the orthogonal codes and non-orthogonal codes. The orthogonal Walsh-Hadamard code is frequently used in CDMA systems as the cross- correlation is zero and the impulse autocorrelation property is unity. PN sequence, gold code and Kasami code are the few non-orthogonal spreading codes in use. These codes are used in encoding and decoding of original data and protecting from interference in CDMA. In CDMA encoder, the input data signal is applied to the modulator with unique spreading code, and these modulated signals are added arithmetically before transmission. The encoded signal is transmitted through the channel and received in the decoder module. The decoder demodulates the encoded signal with the same unique spreading code. The encoded multi-sum signal is either accumulated in positive accumulator register (if spreading code bit is 0) or negative accumulator register (if spreading code bit is 1) and these accumulated values are sent to the comparison module. After comparison, if the positive accumulator value is high the transmitted data signal is 1 otherwise the signal is 0. Unique spreading codes are assigned to each user to avoid multiple access interference. The different spreading code protocols are reviewed and analyzed [8]. Among the several protocols, transmitter based protocol (T protocol) gives better performance by assigning unique spreading code to CDMA system. Table 1 describes the conventional CDMA operation with suitable notation. Table 1 Definition of Notations Notation Description Data bit of jth sender for ith code sequence Orthogonal code ith sequence for jth sender Encoded chip value of ith code and jth sender Arithmetic Sum of ith code sequence Positive register value for ith value of code sequence Negative register value for kth value of code sequence N Number of code sequences The data sent by each sender is XORed with the unique code to generate the chip. These chips are added arithmetically to get the multi-bit sum. This multi-bit sum is sent to the decoder for reconstruction of original data. The encoding process is shown mathematically in the below equations. Parallel Overloaded CDMA Crossbar for Network on Chip 109 (1) ∑ (2) where means XOR operation. The decoding process is expressed mathematically as below and (3) ∑ and ∑ (4) where PR and NR are positive and negative registers, also PAC and NAC are positive register with an accumulator and negative register with an accumulator. Problem Statement CDMA through its multiple accesses technology enables number of transmitters to transfer data simultaneously to number of receivers. For efficient data transfer, spreading sequences must satisfy the orthogonal, balance and run-length properties. Though orthogonal sequence like Walsh code is used for improving bandwidth utilization, the code utilization of it is less, the cross correlation between some shifts is not zero and the total delay for generating the code is not fixed. A 16 node CDMA needs 32-bit Walsh code as 16-bit Walsh code provides only 15 orthogonal sequences and hence it leads to wastage of code sequences in the system [12]. The proposed work provides a suitable solution for this problem. The contributions of the proposed work is as follows (i) Implementing CDMA with orthogonal gold codes to increase the code utilisation and to reduce the MAI. (ii) Modifying and advancing a novel approach of OCDMA system and adding extra PEs to the router for improving the performance of NoC [13]. (iii) Simulating the proposed design in Xilinx software for synthesis and comparison of the results with existing work. 4. GENERATION OF SPREADING CODE FOR CDMA SYSTEM As described above, Walsh code is not suitable for high data rate systems. Therefore, instead of Walsh code orthogonal gold code is implemented. The generation of gold code is through proper selection of PN-sequences used as initial values for linear feedback shift registers (LFSR) [14]. Fig.2 describes the generation of gold code with proper m- sequence. Gold sequence generator gives codes with a N bit sequence. Experiments show that orthogonal gold codes are obtained by affixing „0‟ to the non-orthogonal gold sequence. Further, there is no wastage of code sequences in gold code set and the sequences are utilized efficiently. Hence, orthogonal gold sequences are suitable for huge node (port) NoC system whose data width is 16, 32, and 64 bits. Nevertheless with regard to BER, orthogonal gold codes provide similar performance compared to walsh codes. 110 A. K. K, D. P Fig. 2 Generation of gold code sequence with correct selection of PN-sequences 5. NOC ROUTER WITH PARALLEL OCDMA CROSSBAR Each PE of NoC is connected with Network Interface (NI) i.e. either transmit NI or receive NI. The input port consists of FIFO and Finite State Machine (FSM) controller, and the FSM controller will direct the data packets based on FIFO [16]. The data is divided into packets before being transferred to the FIFO from transmitting NI. The size of FIFO is decided by the width of the data packet. The distributed round robin arbiter provides grants for the packets which are ready to transmit from FIFO. The arbiter selects the input port and output port based on the FIFO memories. The transmit NI will assert the request to the arbiter, then depending on the FIFO memory the arbiter will provide the grants in round robin fashion. Hence only one data packet will be sent to the CDMA system, thereby avoiding the conflict between data packets during transmission. From NI, the data packets are sent to Parallel to Serial converter (PTS) module, and PTS provides the data packets serially to the encoder of CDMA. Then, the serialized data is encoded with spreading code and the bits are summed arithmetically to form multi-bit sum. This multi-bit sum is sent to the decoder module where the original data bits are reconstructed based on decoder logic and then these serialized data are forwarded to the Serial to Parallel converter (STP) module. STP converts the data bits into data packet again, and the data packets are sent to the received NI of the output port. Finally, the receive NI sends the data packets to the FIFO port. The store and forward packet switching is flexible for the proposed NoC system as the ports are using FIFO. Similarly, OCDMA replaces the crossbar of NoC switch configuration. The deterministic X-Y routing protocol is used for data transfer from source NoC switch to destination switch as it is straightforward, flexible for 2-D mesh design and free from deadlock. The control block which consists of arbiter is used to operate the spreading sequence assignment and provide data transaction permission for the winning ports. The concept of overloaded CDMA system is implemented in wireless communication networks for increasing the number of trans/receiving ports without increasing the system complexity [13]. The difference between OCDMA technology and standard CDMA is in terms of code length i.e. L>N1. L is the code length for OCDMA and N1 is the code length for classical CDMA. OCDMA facilitates multi-bit port transmission with minimal changes to traditional CDMA system. Hence, OCDMA system needs long sequence generator such as gold code generator. Parallel Overloaded CDMA Crossbar for Network on Chip 111 The proposed work is implemented for NoC switch with CDMA without increasing the system complexity, fixed latency and limited system cost. To improve the bandwidth and reduce the area overhead, extra PE is connected to each NoC switch which reduces the requirement of more switches and also reduces the area overhead of per-PE. The fact that the increasing number of PEs per NoC switch increases the communication requests there by increasing the inter communications links [15]. The modification of standard CDMA system is required to achieve these objectives. The total encoding or decoding process of CDMA depends on the spreading code length which equals to clock cycles for one data transaction. The completion of a single transaction requires N clock cycles which are also synchronized with the counter. Fig. 3 N input and N output ports of NoC switch with OCDMA crossbar Building blocks of OCDMA crossbar The OCDMA crossbar is designed of three main components: (i) encoder (ii) decoder and (iii) control block and these modules are shown in Fig. 3 along with components of NoC. The control block mainly controls the data transmission in terms of selection of proper input port, assigning the code sequence and counter for measuring the clock cycles. 1) Encoder module The operation of encoding process is same as conventional CDMA but the data is encoded bit wise in parallel manner. The multi-bit sum of data is transferred to the decoder module parallelly [11] therefore one clock cycle is sufficient for the completion of the process of encoding one bit of nodes. The data chips are XORed and added simultaneously from the ports hence the proposed encoder reduces the clock cycles for completion of the encoding process than the standard CDMA. Fig.4 shows the parallel encoding method for OCDMA with orthogonal gold code. The nodes send the data bits serially to the encoder block, and then the multi-bit sum is sent parallel to the decoder block. The CDMA requires total of 24 bit to transfer original data of 8 bit because multi- sum of each bit requires 3 bit when it adds arithmetically. 112 A. K. K, D. P Fig. 4 Parallel process of encoding in OCDMA crossbar 2) Decoder module Fig. 5 describes the parallel decoding process of OCDMA. The parallel multi- bit sum is received by the decoder module through the channel and the encoded sum value first reaches the de-multiplexer stage. The encoded data bit is sent to the positive register (if spreading code is zero) or negative register (if spreading code is one), then the values of Fig. 5 Parallel process of decoder architecture of OCDMA crossbar Parallel Overloaded CDMA Crossbar for Network on Chip 113 both registers are accumulated. Finally, these positive accumulated values and negative accumulated values are sent to the comparison module. The original data bit would be 1 if the PAC is high else the original data bit is 0. These registers are usually of length N/2 because of the balance property of the orthogonal spreading code. Therefore, both the registers are of same length which is half of the spreading code length. The decoding process is executed parallelly for each spreading code of the multi-bit sum. 3) Control Block At the initial stage of data transfer, the control block provides spreading code sequences for the transmitter and then the transmitter transfers the code to the receiver. The arbiter eliminates the congestion and provides grant signal to input port for transferring the data to the crossbar by round robin fashion [15]. The counter within the arbiter module initializes the spreading sequences for all the senders. The control block sends the handshake signals to verify codes of the corresponding encoder and decoder. The code pool will assign a unique spreading code to each transmitter when it receives a request from the arbiter module. Fig.6 describes the encoding and decoding process of 8 bit orthogonal gold code. The sender sends the data bit serially and orthogonal codes are assigned to each sender by the gold code generator. The data bit is XORed with code bit parallelly and the encoded first bit of each sender is sent to the decoder section. The first bit of the code for each sender is zero but the multi-bit sum of the first encoded data is four after XORing with data bits. This process continues for each sender and the multi-bit sum is calculated for each encoded bit. The multi-bit sum is sent to the accumulators depending on the code bit value. For decoding of first bit, the positive accumulator register is more than negative accumulator register hence the data bit is re-constructed as zero. The process continues until the system gets the 8-bit original data. 6. IMPLEMENTATION The simulation and synthesis results are presented in terms of area, delay and power consumption for the parallel OCDMA crossbar of NoC switch with 2 PE. The proposed work is simulated in Xilinx software and implemented on Vertex- 6 (XC6VLX760) FPGA. The implementation of NoC switch is carried out using different OCDMA crossbar with spreading code lengths N= {4, 8, 16} and the comparison is also provided with existing NoC switches. The parameters used for simulation of NoC switch are tabulated in Table 2. Table 2 Simulation parameters Simulation Parameter Values Topology 2D Mesh Arbiter Distributed Round Robin Switching Store and Forward Routing algorithm Minimal Adaptive Crossbar OCDMA Data packet length 8 bit Buffer Yes Simulator Riviera-Pro Traffic Scenario Uniform Random Traffic Distribution Poisson 114 A. K. K, D. P Fig. 6 Transmission and reception of OCDMA with 8 orthogonal gold codes The performance metrics considered are area utilization (Slice Registers, Slice LUTs and LUT-FF pairs), maximum clock frequency (delay) and power consumption (dynamic power). The encoder and decoder of OCDMA is implemented individually and applied to the crossbar of NoC switch. Fig. 7 shows the implementation results for 4,8,16 nodes of OCDMA crossbar for NoC switch.From Fig. 7 (a), it is evident that the area utilization is increasing with increasing number of bits because the NoC switch requires high architecture for transmission and reception of data packet. Inference from Fig. 7 (b) concludes that the maximum clock frequency is decreasing with increase of data packets because of the converters (STP and PTS) present in NI. The power consumption of this NoC switch increases with its data packet because the transition activity is more when the data is undergoing STP/PTS block, hence dynamic power consumption also increases and it is shown in Fig.7(c). The throughput ( ) is calculated as (5) where Nc is the number of required clock cycles, Nbpp is the number of bits in a packet, Npe is the number of received packets at the PE and tc is the clock period for complete data transmission. From Fig. 7(d), it is inferred that the throughput is increasing with increasing data because more PEs are receiving data packets within the specified clock period. To improve the bandwidth and reduce the area overhead, extra PE is connected to Parallel Overloaded CDMA Crossbar for Network on Chip 115 each NoC switch which reduces the requirement of more switches and also reduces the area overhead of per-PE. (a) (b) (c) (d) Fig. 7 (a-d) Implementation results in terms of area utilization, maximum clock frequency, power consumption and throughput for NoC switch with OCDMA crossbar of 4, 8, 16 nodes Table 3 shows the comparison results for the 8-bit different CDMA crossbar of NoC switch in terms of area utilization (LUT-FF pairs), delay (ns) and power consumption (mW) which are implemented in Vertex-6 FPGA device. The proposed work provides better results than WB-CDMA [7], SB-CDMA [12] and OCDMA [13] as the encoding and decoding processes are executed in parallel. This parallel OCDMA crossbar switch requires less area utilization because of orthogonal gold codes used for spreading and elimination of selector (multiplexor) and additional non-orthogonal sequence generator in OCDMA [13]. Even though, the number of spreading codes is more, the area overhead of Parallel OCDMA is lesser than OCDMA [13] because of PE clustering which reduces the number of required switch for complete data transfer from source port to destination port. Table 3 Comparison of parallel OCDMA with existing CDMA crossbar of 8-bit data packet per switch CDMA(8-NODE) Area (no. of LUT-FF) Delay(ns) Power Consumption(mW) WB-CDMA [7] 782 2.82 17.53 SB-CDMA [12] 684 2.71 15.21 OCDMA [13] 692 2.96 11.46 Parallel OCDMA 663 2.673 9.08 116 A. K. K, D. P Fig. 8 (a-d) shows the comparison of parallel OCDMA with OCDMA [13] for different number of nodes. From the figure, it is inferred that the performance of parallel OCDMA is improved compared to the existing work because of efficient code utilization and PE clustering. This crossbar switch shows 9.79% improvement in delay than OCDMA [13] with minor modifications in simple CDMA operation. The major power consumption is due to buffers in the bi-directional ports. But in the proposed method encoder and decoder modules are placed in NI of NoC, hence buffers are not operated when the data packets are encoding and decoding. Consequently 20.76% improvement in power consumption is obtained than OCDMA [13]. (a) (b) (c) (d) Fig. 8(a-d) Comparison for OCDMA [13] with Parallel OCDMA for different number of nodes in terms of area utilization, clock frequency, power consumption and throughput The parallel OCDMA NoC switch with 2PE extended for 3 3 mesh based NoC system which of 9 bi-directional routers and 18 PEs. For analyzing the packet latency and throughput, the mesh based NoC is simulated on Riviera pro for windows. Number of experiments are conducted in uniform-random traffic pattern for observation of these performance metrics. The data packet latency (clock cycles) performance for NoC with 2 PE is obtained and compred with that of NoC switch with single PE as shown in Fig.9 (a). From this figure, it is evident that the proposed work shows reduced packet latency because of parallel processing of encoder and decoder for transmission of the data packet. Throughput performance of NoC switch with 2 PE is also obtained and compared with Parallel Overloaded CDMA Crossbar for Network on Chip 117 that of NoC switch with single PE. From figure, it is understand that as number of PE‟s increased, its latency and throughtput performances are improved. (a) (b) Fig. 9 Simulation results for network latency with injection load (a) and throughput with injection load (b) in uniform-random traffic pattern 7. CONCLUSION This paper proposed the overloaded CDMA crossbar for NoC with parallel encoding and decoding process with Walsh codes being replaced by orthogonal gold codes. A parallel encoder and decoder transfer the data in the same clock cycle hence the performance of proposed OCDMA crossbar is increased. The results are improved with respect to latency, area usage and power consumption when compared with the existing CDMA crossbars. The parallel OCDMA crossbar switch showed 9.79% decreament in delay and showed 20.76% improvement in power consumption than OCDMA [13]. In future work, NoC switch will be present with different fault routing algorithms for handling permanent and transient faults. REFERENCES [1] International Technology Roadmap for Semiconductors 2012(www.itrs.net). [2] M. C. Chiang, G. S. Sohi, “Evaluating design choices for shared Bus multiprocessors in a throughput oriented environment,” IEEE Transactions on Computers, vol. 41, no. 3, pp. 297-317, March 1992. [3] D. Sigüenza-Tortosa, T. Ahonen, and J. Nurmi, “Issues in the development of a practical NoC: The Proteo concept,” Integretion the VLSI Journal, vol. 38, no. 1, pp. 95–105, October 2004. [4] T. Bjerregaard and S. Mahadevan, “A survey of research and practices of network-on-chip,” ACM Computing Surveys, vol. 38, no. 1, pp.1-50, March 2006. [5] S. A. Hosseini, O. Javidbakht, P. Pad, and F. Marvasti, “A review on synchronous CDMA systems: Optimum overloaded codes, channel capacity, and power control,” EURASIP Journal of Wireless Communications Networking, vol. 1, pp. 1-22, December 2011. [6] L. Benini and D. Bertozzi, “Xpipes: A network-on-chip architecture for gigascale systems-on-chip,” IEEE Circuits and Systems Magazine, vol. 4, no. 2, pp. 18-31, September 2005. [7] D. Kim, M. Kim, and G. E. Sobelman, “CDMA-based network-on-chip architecture,” In Proceedings of the IEEE Asia-Pacific Conference Circuits Systems, December 2004, pp. 137-140. 118 A. K. K, D. P [8] X. Wang, T. Ahonen, and J. Nurmi, “Applying CDMA technique to network-on-chip,” IEEE Transactions on Very Large Scale Integration Systems, vol. 15, no. 10, pp. 1091-1100, October 2007. [9] J. Kim, I. Verbauwhede, and M.-C. F. Chang, “Design of an interconnect architecture and signaling technology for parallelism in communication,” IEEE Transactions on Very Large Scale Integration Systems, vol. 15, no. 8, pp. 881-894, August 2007. [10] T. Nikolic, M. Stojcev, and G. Djordjevic, “CDMA bus-based onchip interconnect infrastructure,” Microelectrons Reliability, vol. 49, no. 4, pp. 448-459, April 2009. [11] B. Halak, T. Ma, and X. Wei, “A dynamic CDMA network for multicore systems,” Microelectrons Journal, vol. 45, no. 4, pp. 424-434, April 2014. [12] J. Wang, Z. Lu and Y. Li, “A New CDMA Encoding/Decoding Method for on-Chip Communication Network,” IEEE Transactions on Very Large Scale Integration Systems, vol. 24, no. 4, pp. 1607-1611, April 2016. [13] K. E. Ahmed, R. Rizkand M. M. Farag, “Overloaded CDMA crossbar for Network on Chip,” IEEE Transactions on Very Large Scale Integration Systems, vol. 25, no. 6, pp. 1842-1855, January 2017. [14] L. Hanzo and T. Keller, “OFDM and MC-CDMA: A Primer,” © 2006 John Wiley & Sons, Ltd. ISBN: 0- 470-03007-0, 2006. [15] R. Kumar and A. Gordon-Ross, “MACS: A Highly Customizable Low-Latency Communication Architecture,” IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 1, pp. 237-249, January 2016. [16] A. K. K, P. D., “A Survey for Silicon on Chip Communication”, Indian Journal of Science and Technology, vol. 10, no. 1, January 2017.