# A SCALABLE TEST STRATEGY FOR NETWORK-ON-CHIP ROUTERS

Alexandre M. Amory<sup>1</sup>, Eduardo Brião<sup>1</sup>, Érika Cota<sup>1</sup>, Marcelo Lubaszewski<sup>1,2</sup>, Fernando G. Moraes<sup>3</sup>

<sup>1</sup> PPGC - Instituto de Informática - UFRGS - Av. Bento Gonçalves, 9500, Porto Alegre, RS – Brazil
<sup>2</sup> PPGEE - Depto. Eng. Elétrica - UFRGS, Av. Oswaldo Aranha, 103, Porto Alegre, RS – Brazil
<sup>3</sup> PPGCC - FACIN - PUCRS - Av. Ipiranga, 6681, Porto Alegre, RS – Brazil

{amamory,ewbriao,erika}@inf.ufrgs.br, luba@eletro.ufrgs.br, moraes@inf.pucrs.br

#### Abstract

Network-on-Chip has recently emerged as an alternative communication architecture for complex system chip and different aspects regarding NoC design have been studied in the literature. However, the test of the NoC itself for manufacturing faults has been marginally tackled. This paper proposes a scalable test strategy for the routers in a NoC, based on partial scan and on an IEEE 1500-compliant test wrapper. The proposed test strategy takes advantage of the regular design of the NoC to reduce both test area overhead and test time. Experimental results show that a good tradeoff of area overhead, fault coverage, test data volume, and test time is achieved by the proposed technique. Furthermore, the method can be applied for large NoC sizes and it does not depend on the network routing and control algorithms, which makes the method suitable to test a large class of network models.

## 1. Introduction

As the number of IP cores in systems increases, the implementation of a single broadcast or bus-based communication architecture becomes a time-consuming task in the system design cycle. For systems with intensive parallel communication requirements, busses may not provide the required bandwidth, latency, and power consumption [8]. One emerging solution for such a communication bottleneck is the use of an embedded switching network, called Network-on-Chip (NoC), to interconnect the IP cores in a System-on-Chip (SoC) [5].

Several authors have presented different aspects regarding the design and implementation of on-chip networks [5][8][13]. Recently, industrial NoCs have also been proposed [11]. Furthermore, the reuse of the NoC as Test Access Mechanism (TAM) has been presented as a costeffective strategy for the test of embedded IP cores, with reduced area, pin count, and test time costs [2][4]. Although one may claim that the network operation is also tested when it is transmitting test data, for diagnosis purposes and complete fault coverage it is important to define a test scheme for the network before its reuse as TAM. In effect, test strategies that reuse the NoC assume that it has been tested by a specific method before being reused.

Some test approaches for NoCs have been discussed in the literature [1][10][11]. Aktouf [1] suggests the use of a boundary scan wrapper. Other approaches [10][11] suggest that a wide variety of standard Design-for-Test (DfT) solutions can be used, from BIST for FIFOs, to functional testing of wrapped routers. However, those proposals have not been applied, to the best knowledge of the authors, to actual NoCs.

In this paper, we firstly verify the efficiency of some of the previously suggested NoC test approaches, as well as the applicability of standard DfT techniques [3][6][12] to NoC testing. Experiments show that existing approaches may lead to considerable area overhead and test time, making the NoC testing a major bottleneck for the system design. Hence, we propose a scalable and cost-effective DfT strategy for the routers of the NoC.

The proposed method is based on partial scan and on an IEEE 1500-compliant test wrapper, and it takes advantage of the NoC regularity. Moreover, the test strategy is scalable and independent of the network functional operation, which makes it suitable for a large class of network models and implementations. The method is applied to three versions of a NoC model with different sizes to demonstrate its effectiveness. The results are analyzed in terms of area overhead, test time, test data volume, fault coverage, and power dissipation.

The contributions of this paper are twofold: firstly, it shows that the application of standard DfT techniques to

Paper 25.1

the NoC testing is not straightforward, and may lead to excessive costs if applied deliberately. Secondly, it presents a structured, scalable, and cost-effective test scheme for NoC routers.

The paper is organized as follows: Section 2 presents a brief overview of NoC design and a description of the network used in the sequel of the paper. Section 3 presents the results of the application of standard DfT methods in the test of a NoC. Section 4 describes the proposed test strategy, while Section 5 discusses the experimental results. Section 6 concludes the paper.

#### 2. Network-on-Chip Background

NoCs typically use the message-passing communication model, where the processing IP cores attached to the network communicate by sending and receiving request and response messages. Depending on the network implementation, messages can be split into smaller structures named packets, which are individually routed. A packet is composed by a header, a payload, and a trailer. The header and the trailer frame the packet, and the payload carries the data being transferred. The header also carries the information needed to establish the path between the sender and the receiver.

Besides its topology, a NoC can be described by the approaches used to implement the mechanisms of flowcontrol, routing, arbitration, switching, and buffering [5]. The flow control deals with data traffic on the channels. Routing is the mechanism that defines the path a message takes from a sender to a receiver. The arbitration establishes priority rules when two or more messages request the same resource. Switching is the mechanism that establishes the internal path between an input and an output port of the router. Finally, buffering is the strategy used to store messages when a requested output channel is busy. Current IP cores usually need to use network interfaces (functional wrappers) to adapt their interfaces and protocols to the ones of the target NoC. Such wrappers pack and unpack data exchanged by the processing IP cores.

Figure 1 presents the conceptual model of a router. It contains a centralized control logic module, responsible for arbitration and routing, and bi-directional ports. Each input port has FIFOs for temporary data storage, and the output port implements the flow control algorithm. The local port connects the router to an IP core. The other ports are connected to its neighbor routers.

Figure 2 illustrates a typical structure of a NoC-based design. As one can observe, primary inputs and outputs of the NoC may be not accessible through the primary I/Os of the system. Thus, the NoC can be seen as another IP core in the system, and it should be integrated into the global test strategy of the SoC. Hence, a TAM must be defined to carry the test patterns/responses to this IP core

while a test wrapper can be used to isolate the NoC during test.

In another view, the routers that compose the NoC can be seen as identical IP cores in the system. In this case, a test wrapper to each router is required, and the NoC can be seen as a hierarchical IP core, where the routers are the sub-cores.

This work aims at defining an efficient test strategy for the routers of a NoC. The proposed strategy is flexible in the sense that it can be applied independently or in conjunction (sharing a TAM, for example) with other cores in the system. A single test wrapper is assumed for the network and test data is internally distributed so that costs are optimized. The test of the communication channels and the test of the network interface with the system are not addressed in this paper.



Figure 1 - A typical organization of a router.



Figure 2 - NoC-based system

#### 2.1 Case Study: the SoCIN Network

In order to evaluate the proposed test strategy, a packetswitched network named SoCIN (*System-on-Chip Interconnection Network*) [13] is used. The router that implements the network protocol is called RASoC (*Router Architecture for System-on-Chip*). This router uses input buffering, a round-robin algorithm for arbitration, a handshake algorithm for flow control, and an oblivious routing algorithm. Switching is based on the wormhole approach, where a packet is broken up into flits (flow control units). The channel width used in the experiments is 20 bit-wide (16 data bits and 4 control bits). RASoC has four ports to connect to its neighbors and one port to connect to the embedded IP core. The network implements a 2-D torus topology, where each router is configured with 16-bit flit width and FIFOs depth equal to 4.

Notice that, although the experiments are performed for the SoCIN network, most NoCs proposed in the literature are based on packet switching, and present input buffering and regular topologies like mesh and torus [8]. In addition, the proposed test approach does not depend on the network routing and control algorithms. Thus, the conclusions drawn here can be applied to a large class of network implementations, even when different routing algorithms are used.

Table 1 presents some characteristics of the original RA-SoC router, without test circuitry. The characteristics of a processor core called Plasma [plasma] are also presented in the table for comparison.

Table 1 – Comparing the RASoC router and the Plasma processor.

| -      | # gates | # flip-<br>flops | flip-flop<br>relative area | IO<br>pins | power<br>(µw) |  |
|--------|---------|------------------|----------------------------|------------|---------------|--|
| RASoC  | 4605    | 425              | 45%                        | 202        | 2.24          |  |
| Plasma | 20118   | 1444             | 36%                        | 105        | 6.12          |  |

Plasma is a typical example of small-to-medium size IP core. The processor is compatible to MIPS-I instruction set, and has a 32-bit multiplier, a shifter, a 32x32 register bank, and 3-stage pipeline. Plasma has less IO pins, and consequently, a smaller area for the test wrapper than the RASoC router. In addition, in spite of the 32x32 register bank, Plasma has a lower density of flip-flops (36%) than RASoC (45%).

The power consumption presented in Table 1 is characterized considering the dynamic and static power consumption of the module. To evaluate the dynamic consumption, the module is initially synthesized to an ASIC technology library for which the power consumption of technology cells is available. The resulting netlist description is simulated. During this simulation, the switching activity of each cell of the technology library is captured. Then, the power consumption per clock cycle is computed by multiplying the number of toggles and the power consumption per cell. The total power consumption for the whole simulation is given by the average power consumption of the module in each clock cycle. Plasma power consumption was characterized for an arbitrary application. For RA-SoC, the power was characterized considering four packets being routed in parallel, which maximizes the switching activity. All packets have the same number of flits and random payload data.

One can observe that Plasma presents higher power consumption than RASoC, since Plasma has more flip-flops, which are the main source of power consumption. However, considering the power per gate measure, RASoC has a ratio (2.24/4605) of 0.5nw/gate against 0.3nw/gate of the Plasma processor. Thus, although a single router is smaller than most functional IP cores, it may have a higher switching activity. Therefore, the total power consumed by the NoC may easily be higher than the consumption of other IP cores in the system.

This comparison between RASoC and Plasma highlights the challenge to find a cost-effective test strategy for NoCs. The router has fewer gates per I/O port, higher density of flip-flops, and higher power consumption per gate. Those features indicate, for instance, that test wrapper and full-scan implementations may be too costly if applied to each router independently. On the other hand, NoCs usually have very regular designs. Although different implementations of routers exist, most of them follow the conceptual model presented in Figure 1. The combination of the regularity of a NoC and the predictable router structure is explored in this work to reduce test costs.

# **3.** Evaluating Standard Test Strategies in NoCs

Some authors argue that the NoC is another IP core (flat or hierarchical) in the system. In this case, its test can be defined using traditional core-based testing strategies [10,11]. This means the use of an IEEE 1500-compliant test wrapper [6] and the use of scan-based approaches to test the routers. Considering the NoC as a *flat core*, a single test wrapper is inserted in the NoC interface. Otherwise, if the NoC is considered as a *hierarchical core*, one test wrapper for each router is necessary.

Considering the regular design and the presence of identical IP cores in the NoC structure (routers), test strategies previously proposed for similar systems may be applied to the on-chip network. Wu and MacDonald [12], and Arabi [3] propose test strategies for *identical IP cores* in a single chip. Both approaches require full-scan and IEEE 1500 wrapped cores with registered I/O pins. The test consists on applying test patterns to all identical IP cores in parallel and comparing the responses within the chip. Whenever a fault is detected, a special circuit at the output of each IP core allows to individually test each block for diagnosis. Those approaches result in a considerable reduction in test time (due to the test parallelism), in test volume (do not require storage of the test response in the tester), and in ATPG CPU time (ATPG runs for a single IP core).

Another approach, proposed by Aktouf [1], is a *boundary-scan based strategy* for testing massively parallel machines. The method takes advantage of the regularity of the communication architecture to reduce the test time. Boundary scan cells involve each router in the architecture. Full scan routers are assumed.

We evaluate these four test configurations with respect to the area overhead and results are presented in Table 2. The reference design is a 3x3 SoCIN network with nine RASoC routers implementing a torus topology as described in Section 2.1. The size of this NoC without test circuitry is 41,445 gates. The second column in Table 2 (Flat core full scan) shows the results when the network is considered as a single IP core, i.e., a single test wrapper and a full-scan strategy are implemented for the network as a whole. Column 3 (Boundary scan) shows the results for the boundary-scan approach. In this configuration, all routers have full-scan and a boundary scan test wrapper, as proposed in [1]. The last two configurations assume the network is a hierarchical IP core, i.e., each router is treated independently. In the third configuration, shown in Column 4 of Table 2 (Hierarchical core-Full scan) each router has a test wrapper and implements a full scan testing. The fourth test model (column Hierarchical core-Full scan with comparator) repeats the third one, but includes internal comparators, as proposed in [3][12], to reduce the test volume. The fault coverage for all configurations is above 98%.

| Table 2 – | Standard test strategies | 5 |
|-----------|--------------------------|---|
| ap        | blied to a 3x3 NoC.      |   |

|                          |                           |                  | Hierarchical core |                                 |  |
|--------------------------|---------------------------|------------------|-------------------|---------------------------------|--|
|                          | Flat core<br>full<br>Scan | Boundary<br>scan | Full<br>scan      | Full scan<br>with<br>comparator |  |
| Total<br>area<br>(gates) | 49437                     | 79290            | 62037             | 62994                           |  |
| Area<br>overhead         | 19%                       | 91%              | 50%               | 52%                             |  |

One can observe in Table 2 that the area overhead for three out of four approaches is prohibitive. The only exception is the first configuration, which considers the NoC as a flat core with full scan. Although this configuration presents an affordable area overhead, the flat approach typically ignores the internal organization of the IP core. Indeed, we show further in the paper that less area overhead can be achieved when the internal structure of the NoC is considered.

Therefore, new methods must be developed to meet the specificities of this new communication platform. Such a method is proposed in the next section and applied to the same 3x3 SoCIN network for comparison.

#### 4. Proposed Test Strategy

We propose a NoC testing approach that combines the best features of the "*Flat core full scan*" approach (reduced area overhead) and the "*Hierarchical core Full scan with* comparator" (reduced test volume) configurations explained in Section 3. The proposed strategy considers the NoC as a flat core (a single test wrapper for the whole network is required) but it does not require a full scan implementation, which further reduces the area overhead. Moreover, different from the flatten approach, we explore the regular design of the NoC to reduce test time and data volume.

The proposed strategy is presented in three parts:

- a) the router testing, which shows how to configure the router internal scan chains to reduce area,
- b) the NoC testing, which explains how the scan chains of the routers are connected together at NoC level, and
- c) the NoC test wrapper, which details the definition of the IEEE 1500 compliant test wrapper for the NoC exploring the network regular structure to reduce area.

#### 4.1. Router Testing

As detailed in Section 2, a router is composed by control logic (routing, arbitration, and flow control modules) and input FIFOs. Control logic is considerably simpler to be tested, because it contains a small number of flip-flops and gates. The input FIFO poses the main problems for the router testability. Figure 3.a illustrates the architecture of the primary inputs of a router, i.e., the FIFO implementation. If full scan is directly implemented in this structure, all flip-flops of the FIFO will become scan flip-flops and will be chained together.

In the proposed approach, we split the FIFO and define a single scan chain using only the first position of the queue, as illustrated in Figure 3.b. This single scan chain provides the controllability and observability required to test the whole structure, since the FIFO is usually not very deep and there is no feedback logic in this block. Thus, any sequential ATPG tool can generate test patterns for the whole FIFO with an affordable effort.



Figure 3 – Splitting the input FIFOs: (a) original and (b) modified for testing.

To complete the router testing, a second scan chain is defined with the remaining flip-flops of the control logic, which are the flip-flops used to implement the routing algorithm, for example.

This approach avoids expensive solutions like full-scan and BIST [10][11], while the ATPG tool can still generate high fault coverage at reasonable CPU time.

#### 4.2 NoC Testing

After defining the test structure for each router, one must define an access mechanism to transmit test data from the network interface to each router and vice-versa. We propose a generic test communication protocol, which can be applied to regular NoC topologies, such as mesh and torus. In this protocol, test patterns, coming from the external tester, are simultaneously applied to all identical routers. Test responses of the routers, on the other hand, are internally compared, as proposed in [3][12]. If test responses are different, a mechanism for diagnosis can be activated. Otherwise, the test continues.

Test vectors are broadcasted to routers by a single pin in the network interface, as shown in Figure 4. Note that different from Figure 2, only test related ports and wires are shown in Figure 4. The block denoted by the equal signal indicates a comparator that checks test responses against each other. The number of comparators required in the NoC depends on the number of scan chains in the routers. For each scan chain in the routers there must be a comparator. In Figure 4, for instance, a single scan chain per router is assumed, and the four chains feed the single comparator. Ideally, all routers are tested in parallel and a single comparator is used. However, there may be limitations in the maximum fan-out of the scan input pin (SI in Figure 4) and in the test time achieved by a single scan chain in the routers. The NoC designer can define, then, an alternative solution by increasing the number of scan chains per router (area overhead does not change) to reduce test time and increasing the number of comparators, whose area can be easily estimated.



Figure 4 – Testing multiple identical routers.

#### 4.2.2 The Comparator

Figure 5 presents a circuit for output comparison, which is similar to the modules proposed by Wu and MacDonald [12], and Arabi [3]. When running in test mode, ports *compEnable* (*compEnb*<sub>x</sub>) and *enable detection* (*en\_det*) are assigned to '1', while the diagnostic port *diag* is set to '0'. Signals *compInput* (*compIn*<sub>x</sub>) receive one scan chain output of each router being tested. All corresponding bits unloaded from each router scan chain are compared against each other, as depicted in Figure 4. If there is any difference, the *xor* gate generates an error signal '1' in the *so* pin.



Figure 5 – Comparator block.

The comparison logic also supports diagnosis. In diagnosis mode, signal *diag* is initially set to '1'. Then, signal *compEnb<sub>x</sub>* corresponding to a single router is set to '1', while signals *compEnb<sub>x</sub>* of the remaining routers are set to '0'. Test vectors are applied again to all routers, but the output of only one router will be captured. This procedure is repeated until the defective router is found.

Notice that other testing approaches that consider the NoC as a non-hierarchical core are not able to identify the router with defect since they abstract the internal structure of the network. On the other hand, the hierarchical approach has a large area overhead. However, our approach is based on the flat design but structured in such way that the defective router can be identified.

#### 4.3 Test Wrapper for NoCs

To complete the definition of the test strategy, we briefly describe an IEEE 1500-compliant wrapper implementation for the network.

In order to support on-chip test response comparison, one must provide the same test stimuli for all the routers. Thus, functional pins and scan chains of the routers must receive the same test stimuli. Identical stimuli for scan chains are provided by the strategy presented in Section 4.2. However, since there is a single test wrapper for the whole NoC, the test wrapper design must also ensure that the functional ports of the routers receive the same test stimuli.



modifications required for the test wrapper

Figure 6 – Proposed test wrapper for NoCs.

Figure 6 presents the proposed IEEE 1500-compliant test wrapper for the NoC. In test mode, functional inputs receive test patterns through the  $ci_x$  cells, as in the original test wrapper. However, the difference from the original 1500 test wrapper design is that the number of ci cells is not equal to the total number of functional inputs of the NoC. Instead, there are as many ci cells as the network channel bitwidth, i.e., the number of input pins necessary to connect the NoC to one IP core. In Figure 6, we are assuming 20 bits (see  $ci_0$  to  $ci_{19}$ ). Within the wrapper, these input pins feed the functional inputs of each router.

Similarly, the number of wrapper test output cells is smaller than the number of network functional outputs. Each functional output pin of the NoC is connected to a comparator, similar to what is done with the routers scan chains. Comparators results are chained together ( $co_1$  to  $co_{19}$  in Figure 6) and assigned to a single wrapper scan output pin. For the wrapper shown in Figure 6, for instance, there are 20 comparators, since we are assuming routers outputs of 20 bits (bitwidth of the channel connecting the NoC to the IP core). Such a structure reduces not only the area overhead of the wrapper (by reducing the number of wrapper scan cells), but also the NoC test time (by reducing the number of shift operations during test). The wrapper cell definitions are compliant with the IEEE 1500 standard.

It can be observed in Figure 6 that, for example, functional inputs  $Din_R0[0]$  to  $Din_Rn[0]$  receive the same value through *ci* wrapper cells during test and the comparator presented in Figure 5 is attached to the functional outputs. The result from the comparison can be loaded in the *co* wrapper cells for scan out. During the diagnosis mode, the diagnosis control block is responsible to set the *diag* and *se<sub>x</sub>* ports, presented in Figure 5, to the appropriate router.

We note that the test access mechanism that connects the network to the system interface and the external tester is defined by the SoC designer, since the network is considered as another IP core in the system.

# 5. Experimental Results

We evaluated the proposed test strategy for three different network sizes, to show the scalability of the method.

All the experiments were carried out using DFTAdvisor (scan insertion tool) and Fastscan (ATPG tool) [7] from Mentor Graphics, using the ADK (TSMC 0.35) technology library. For the evaluation of the fault coverage, stuck-at model is assumed and all faults classified by the tools as possibly detectable are considered undetected faults. The ATPG was executed on a Pentium 4 2.6GHz with 1G RAM running Linux OS.

| NoC size | Area costs                   |                                       | Test efficiency          |               |                          |                          | Test time (cycles)        |                         |                         | CPU         |
|----------|------------------------------|---------------------------------------|--------------------------|---------------|--------------------------|--------------------------|---------------------------|-------------------------|-------------------------|-------------|
|          | Original<br>area<br>(#gates) | Area with<br>proposed DfT<br>(#gates) | fault<br>coverage<br>(%) | #<br>patterns | test<br>volume<br>(bits) | #<br>collapsed<br>faults | 3<br>unbalanced<br>chains | 3<br>balanced<br>chains | 8<br>balanced<br>chains | time<br>(s) |
| 3x3      | 41445                        | 45107<br>(+8.8%)                      | 98.82                    | 383           | 254465                   | 105024                   | 32187                     | 21407                   | 9087                    | 2876        |
| 4x4      | 73680                        | 79923<br>(+8.4%)                      | 98.93                    | 395           | 273845                   | 186702                   | 33271                     | 22155                   | 9451                    | 11350       |
| 5x5      | 115125                       | 124616<br>(+8.2%)                     | 98.93                    | 466           | 315065                   | 291712                   | 39286                     | 26182                   | 11206                   | 33916       |

Table 3 – Results for the proposed test strategy.

Table 3 presents the testing results using the proposed approach for each network size. Columns 2 and 3 show, respectively, the area of the original NoC and of the NoC with the proposed test structures. The area is measured in number of gates and does not include wiring length. Column 3 also presents the percentual increase in the area due to the DFT hardware. The fault coverage, the number of test patterns, the test volume, and the number of collapsed faults are presented in Column 4 through 7, respectively. Different test times are presented for each network in Columns 8, 9, and 10, to demonstrate that the designer can use different scan configurations for the routers. Finally, the CPU time spent by the ATPG is shown in Column 11.

The proposed approach has three sources of area overhead: the partial scan chain in the router, the comparators in the NoC, and the test wrapper of the NoC.

Although not shown in Table3, the area of the router with the proposed partial scan is 4,859 gates, which represents an overhead of 5.5% over the original router. A full-scan implementation in this structure results in 14.7% of area overhead. Notice in Table 3 that the area overhead of the proposed method for the 3x3 NoC (+8.8%) is smaller than the one achieved by the pure application of full-scan, which was presented earlier in Table 2 (19%).

For the sake of a fair comparison of wrapper approaches, let us assume a configuration where the proposed partial scan approach for the routers is combined with the traditional IEEE 1500 wrapper. In this case, the area of the resulting NoC is still smaller (46,756 gates compared to the 49,437 gates presented in Table 2). This same configuration (traditional test wrapper) implemented in the 5x5 NoC results in 14.2% of area overhead against 8.2% if the optimized wrapper is used.

It is important to notice that the area overhead of the router test structures decreases as the FIFO depth increases (scan chains are inserted only in the first FIFOs word). Hence, more complex network implementations will not only benefit from the proposed method, but may present a better tradeoff of test costs. Moreover, the area of the test wrapper increases linearly with the number of routers (one wrapper scan pin per router scan cell), and, most importantly, it increases sub linearly with the channel width, since the number of wrapper scan cells for the routers functional inputs may be kept constant.

We did not find in the literature the size of a complete system using NoCs. Therefore, we cannot estimate accurately the overall increase in the chip size. Nevertheless, according to the size of the systems evaluated in [9], the current system sizes range from 0.5 to 10 million gates. We estimate, however, the overall contribution of the NoC testing circuitry<sup>1</sup> to be about 0.7-1.9% for a 0.5 million-gate design, and 0.04-0.095% considering a 10 millions-gate design.

The test data volume presented in Table 3 does not consider the expected responses, since they are evaluated onchip. Considering the same test patterns and their corresponding test responses, the total test volume would be 382,004, 405,380, and 470,243 bits, for the 3x3, 4x4, and 5x5 NoCs, respectively. Thus, the test volume saving of the circuit presented in Figure 4 is of 32%. Nevertheless, one can observe that high fault coverage is still achieved independent of the number of routers in the NoC and for the same relative cost in area.

The NoC designer can define different scan chains configurations to connect the routers functional inputs in the wrapper, thus generating different test times. Some possible configurations are presented in Table 3 (Columns 8, 9, and 10). Considering the 5x5 NoC with eight balanced scans, for instance, the test time using a standard test wrapper would be 80,002 clock cycles, while the proposed wrapper achieves 11,206 clock cycles.

Finally, power consumption may be an important limitation to the test parallelization within the NoC. Hence, we have characterized the power consumption per router during test. The power consumption is characterized considering the dynamic and static power consumption during the test execution and the actual test patterns. As explained in Section 2, to evaluate the dynamic consump-

<sup>&</sup>lt;sup>1</sup> (*NoC\_Area\_with\_DfT - Original\_NoC\_Area*) / *To-tal\_Circuit\_Size* 

tion, the router is initially synthesized to an ASIC technology library for which the power consumption of technology cells is available. The resulting netlist description is simulated with the actual test patterns calculated by the ATPG tool. During this simulation, the switching activity of each cell of the technology library is captured. Then, the power consumption per clock cycle is computed by multiplying the number of toggles and the power consumption per cell. The total power consumption for the whole simulation is given by the average power consumption of a router in each clock cycle of the test procedure. For the network configurations presented in Table 3, the power consumption per router is  $4.34 \mu$ W. Since all routers receive exactly the same test patterns, the power of the NoC depends only on the number of routers. Thus, considering a 10 million gate design [9], the total area of a NoC is small (less than 1% considering 3x3 NoC), and then the contribution to the system power consumption is expected to be small too. However, if the test power of the

NoC becomes an issue, one can test groups of routers at a time instead of all of them in parallel.

Routing the comparators may cause routing congestion, timing and power problems because of the long wire length. In these situations, one can use comparators between neighbor routers, thus reducing the wiring length. We are considering manufacturing faults, which are usually randomly distributed in the design. Thus, the probability of multiple errors that produce exactly the same output is low.

Figure 7 shows the scalability of the proposed method. As the size of the NoC increases (see number of gates curve) the test data volume and test time increase in a much lower rate, while the fault coverage is kept constant and the area overhead is reduced. This graph demonstrates that the approach can scale to test very large NoCs that support connecting several IP cores.



Figure 7 – Scalable test approach.

#### 6. Final Remarks

In this paper, we have shown that routers of on-chip networks pose additional challenges to find a cost-effective test strategy when compared to functional cores. The large number of I/O pins, small area, and high density of flipflops make the application of standard DfT techniques a more complex task for those structures. However, the test cost of a NoC can be significantly reduced if its regular design is considered.

We have proposed a scalable test strategy for NoCs based on partial scan and an IEEE 1500-compliant test wrapper, which reduces the test time and area overhead by exploiting the NoC regular design. An academic network has been used to demonstrate the feasibility of our approach. Observing the obtained results one can conclude that the proposed test strategy is indeed a cost-effective solution for the test of routers that compose an on-chip network. We reduced the ability to isolate the routers for test for the sake of area overhead reduction, while keeping a high fault coverage, low test time, and low test data volume. The proposed test strategy can be implemented using only a scan insertion tool and an ATPG tool.

Current work include evaluation of delay faults, evaluation of area overhead considering wiring length, the test of the network interfaces, the impact in test of using buffers of different depths, and the test of the router-to-router channels of the NoC.

#### Acknowledgments

The authors thank Mike R. Jones from Mentor Graphics Corporation for his valuable support on the DfT tools.

### 7. References

- Aktouf, C. "A Complete Strategy for Testing an on-chip Multiprocessor Architecture". *IEEE Design & Test of Computers*, vol. 19-1, 2002, pp. 18-28.
- [2] Amory, A.M.; Cota, E.; Lubaszewski, M. and Moraes, F.G. "Reducing Test Time with Processor Reuse in Network-on-Chip Based Systems". Symposium on Integrated Circuits and Systems Design, ACM Press, 2004, pp. 111-116.
- [3] Arabi, K. "Logic BIST and Scan Test Techniques for Multiple Identical Blocks". IEEE VLSI Test Symposium, 2002, pp. 60-68.
- [4] Cota, E.; Kreutz, M.E.; Zeferino, C.A.; Carro, L.; Lubaszewski, M. and Susin, A.A. "The Impact of NoC Reuse on the Testing of Core-based Systems". IEEE VLSI Test Symposium, 2003, pp. 77-82.
- [5] Guerrier, P. and Greiner, A. "A Generic Architecture for on-Chip Packet-Switched Interconnections". Design, Automation and Test in Europe Conference, 2000, pp. 250-256.
- [6] Marinissen, E.J.; et al. "On IEEE P1500's Standard for Embedded Core Test," *Journal of Electronic Testing: Theory* and Applications, vol. 18, 2002, pp. 365-383.
- [7] Mentor Graphics DfT Tools. www.mentor.com/dft/.

- [8] Moraes, F.G; et al. "HERMES: an Infrastructure for Low Area Overhead Packet-Switching Networks on Chip". *Integration, the VLSI Journal*, vol. 38-1, 2004, pp 69-93.
- [9] Rajski, J.; et al. "Embedded Deterministic Test for Low-Cost Manufacturing". *IEEE Design & Test of Computers*, vol. 20 -5, 2003, pp. 58-66.
- [10] Ubar, R. and Raik, J. "Testing Strategies for Network on Chip". in *Networks on Chip*, A. Jantsch and H. Tenhunen, Eds.: Kluwer Academic Publisher, 2003, pp. 131-152.
- [11] Vermeulen, B.; Dielissen, J.; Goossens, K. and Ciordas, C., "Bringing Communication Networks On Chip: Test and Verification Implications," *IEEE Communications Magazine*, vol. 41-9, 2003, pp. 74-81.
- [12] Wu, Y. and MacDonald, P. "Testing ASICs with Multiple Identical Cores". *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 22-3, 2003, pp. 327-336.
- [13] Zeferino, C.A. and Susin, A.A. "SoCIN: A Parametric and Scalable Network-on-Chip". Symposium on Integrated Circuits and Systems Design, 2003, pp. 121-126.
- [14] Plasma microprocessor description (http://www.opencores.org)