8
IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 3, 1990 AUGUST Fault-Tolerant Hypercube Multiprocessors ~ 361 Shahram Latifi, Member IEEE University of Nevada, Las Vegas Key Words - Communication-link reliability, Hypercube, Markov chain, Sheaf, Subcube reliability Reader Aids - Purpose: Propose a new model Special math needed for explanations: Combinatorial theory, Special math needed to use results: Same Results useful to: Hypercube-network designers probability theory Summary & Conclusions - This paper presents a new design called Fault Tolerunt Hypercube (FTH), obtained by augmenting the hypercube topology with some extra links. The FTH has a graceful degradation in performance with the existence of faults. The hardware (link redundancy) is small and negligible for hyper- cubes with large dimensions. A probabilistic model based on a Markov chain characterizes the FTH-subcube reliability. The mean time to failure is at least 22% better than that for the conventional hypercube. The results have been verified by Monte Carlo simula- tion. The new FTH design is simple and easy to implement. This network cm lend itself to the execution of many parallel algorithms designed to run on hypercubes. The FTH contains many more sub- cubes compared to the standard hypercube, and thus executes tasks requiring various cube sizes. Where allocation and deallocation of tasks to various subcubes is a common practice, this design achieves an excellent processor usage by efkiently and compactly assign- ing subcubes. 1. INTRODUCTION Hypercube structures have received much attention due to the attractive properties inherent in their topology [ 11. Parallel algorithms targeted at this topology can be partitioned into many tasks, each of which runs on one node processor. A high degree of performance is achievable by running every task individual- ly and concurrently on each node processor available in the hypercube. In the n-cube, the high degree of connectivity prevents the non-faulty processors from being disconnected [2], and if the number of faulty nodes does not exceed n, the diameter of the surviving network can be reasonably bounded [3]. But despite this high resilience, the universality of the hypercube can be lost if, as a consequence of the topology changes, the fundamental algorithms must be reorganized in a nonuniform and inefficient way. Fortunately most of the parallel algorithms developed for the hypercube networks can be formulated with the network- dimension as a parameter. More precisely, such algorithms can be performed on any nonfaulty k-dimensional substructure with a slow-down factor of 2”-k [4, 51. This motivates the question, how many dimensions could be lost when the n-cube contains faulty processors. Clearly, failure of a single node destroys one dimension. Failure of as few as two processors could destroy two dimensions (in this case the address of one failed node is the bitwise complement of the other’s). A rough answer to the above question has been obtained by bounding the number of lost dimensions [6]. This study enhances the subcube reliabili- ty of the hypercube through adding some extra links such that fewer dimensions are destroyed as a result of some node failures. The degradation in performance of the n-cube due to failure of a single component (link or node) is very high, eg, if one node or link fails in an n-cube, the best one can do is to extract an operational (n - 1)-cube from the damaged n-cube. This leaves approximately half the nodes and links of the network underused. This problem can be substantially alleviated by augmenting the n-cube with more interconnection links. The resultant network is referred to as Fault Tolerant Hypercube (FTH). By offering many more available subcubes, the FTH can use the network components more efficiently in a faulty environment. Section 2 describes the FTH topology and reveals its at- tractive properties. Section 3 addresses the communication-link reliability in the FTH. Section 4 treats the subcube reliability. A node-failure model characterizes the reliability of any sub- cube in the FTH. The results are closely compared with those obtained for the n-cube. fault-tolerant hypercube communication-link reliability sheaf reliability communication-linkreliability for the standard i-cube communication-link reliability for the FTH of size i ratio of k-cubes in the FTH to k-cubes in the standard hypercube Pr{system is in state i at time t} cumulative probability corresponding to state i mean time-to-failure corresponding to state i underusage factor in the standard cube underusage factor in the FTH least integer upper bound Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue. Assumptions 1. The nodes in the network fail at a constant rate of A. 2. The reliability is the same for all communication links. 3. The states of all elements are mutually statistically independent. 0018-9529/90/0800-361$1 .00@1990 IEEE

Fault-tolerant hypercube multiprocessors

  • Upload
    s

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 3, 1990 AUGUST

Fault-Tolerant Hypercube Multiprocessors

~

361

Shahram Latifi, Member IEEE University of Nevada, Las Vegas

Key Words - Communication-link reliability, Hypercube, Markov chain, Sheaf, Subcube reliability

Reader Aids - Purpose: Propose a new model Special math needed for explanations: Combinatorial theory,

Special math needed to use results: Same Results useful to: Hypercube-network designers

probability theory

Summary & Conclusions - This paper presents a new design called Fault Tolerunt Hypercube (FTH), obtained by augmenting the hypercube topology with some extra links. The FTH has a graceful degradation in performance with the existence of faults. The hardware (link redundancy) is small and negligible for hyper- cubes with large dimensions. A probabilistic model based on a Markov chain characterizes the FTH-subcube reliability. The mean time to failure is at least 22% better than that for the conventional hypercube. The results have been verified by Monte Carlo simula- tion. The new FTH design is simple and easy to implement. This network c m lend itself to the execution of many parallel algorithms designed to run on hypercubes. The FTH contains many more sub- cubes compared to the standard hypercube, and thus executes tasks requiring various cube sizes. Where allocation and deallocation of tasks to various subcubes is a common practice, this design achieves an excellent processor usage by efkiently and compactly assign- ing subcubes.

1. INTRODUCTION

Hypercube structures have received much attention due to the attractive properties inherent in their topology [ 11. Parallel algorithms targeted at this topology can be partitioned into many tasks, each of which runs on one node processor. A high degree of performance is achievable by running every task individual- ly and concurrently on each node processor available in the hypercube. In the n-cube, the high degree of connectivity prevents the non-faulty processors from being disconnected [2], and if the number of faulty nodes does not exceed n, the diameter of the surviving network can be reasonably bounded [3]. But despite this high resilience, the universality of the hypercube can be lost if, as a consequence of the topology changes, the fundamental algorithms must be reorganized in a nonuniform and inefficient way.

Fortunately most of the parallel algorithms developed for the hypercube networks can be formulated with the network- dimension as a parameter. More precisely, such algorithms can be performed on any nonfaulty k-dimensional substructure with a slow-down factor of 2”-k [4, 51. This motivates the question,

how many dimensions could be lost when the n-cube contains faulty processors. Clearly, failure of a single node destroys one dimension. Failure of as few as two processors could destroy two dimensions (in this case the address of one failed node is the bitwise complement of the other’s). A rough answer to the above question has been obtained by bounding the number of lost dimensions [6]. This study enhances the subcube reliabili- ty of the hypercube through adding some extra links such that fewer dimensions are destroyed as a result of some node failures.

The degradation in performance of the n-cube due to failure of a single component (link or node) is very high, eg, if one node or link fails in an n-cube, the best one can do is to extract an operational (n - 1 )-cube from the damaged n-cube. This leaves approximately half the nodes and links of the network underused. This problem can be substantially alleviated by augmenting the n-cube with more interconnection links. The resultant network is referred to as Fault Tolerant Hypercube (FTH). By offering many more available subcubes, the FTH can use the network components more efficiently in a faulty environment.

Section 2 describes the FTH topology and reveals its at- tractive properties. Section 3 addresses the communication-link reliability in the FTH. Section 4 treats the subcube reliability. A node-failure model characterizes the reliability of any sub- cube in the FTH. The results are closely compared with those obtained for the n-cube.

fault-tolerant hypercube communication-link reliability sheaf reliability communication-link reliability for the standard i-cube communication-link reliability for the FTH of size i ratio of k-cubes in the FTH to k-cubes in the standard hypercube Pr{system is in state i at time t} cumulative probability corresponding to state i mean time-to-failure corresponding to state i underusage factor in the standard cube underusage factor in the FTH least integer upper bound

Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue.

Assumptions

1. The nodes in the network fail at a constant rate of A. 2. The reliability is the same for all communication links. 3. The states of all elements are mutually statistically

independent.

001 8-9529/90/0800-361$1 .00@1990 IEEE

362 IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 3, 1990 AUGUST

2. STRUCTURE OF THE FTH

cube The by establishing FTH network spare is constructed links between from each a node standard and hyper- its far-

is called a complemntury link or c-link. The number of c-links I .-.- - - - - - L - - - s , - * c - - J - c I '.! , I y'; 1;- - --a:: :I

01 0 - - - - - - - c - 3 - - - - - - - 111

two neighboring nodes differ in only one bit. Let <a, xl, . . . , o o o ~ ~ ~ o l

c, ' thest node. The spare link joining one node and its farthest node

--- .. ' is 2"-' and thus the link redundancy is l/n. Nodes in the FTH are assigned binary labels from 0 to 2"-' such that labels of

I : : I ,4 : c _ _ _ - _ _ - l - d - - - b < -

x,- > represent the dimensions of an n-cube. The FTH can be visualized as a network with n + 1 dimasions <a, xl, . . . , x,,- ', x, > where x, is the dimension corresponding to c-links. Therefore, all links along dimension xi, called collectively sheufi, connect nodes whose labels differ in the ith bit (for 0 I is n-l)ornodeswhoselabelsarecomplements(fori=n). Links belonging to sheaf i are labeled i for i < n, and labeled c when i=n . Figure la shows the structure of the FTH(3), where dashed lines represent the c-links.

a) All Sheaves are Operational

Theorem I: The sheaf n (set of c-links) can replace any single sheufi, 0 I i s n - 1, in the FTH to yield an n-cube. [The proof is in the appendix.]

This attractive feature can be exploited in the FTH to tolerate certain communication-link failures. More specifical- ly, if any sheaf i is faulty, the new hypercube with dimensions < xo, X I , . . . , xi- ', x,,, xi+ ', . . . , x,- 1 > can emulate the stan- dard cube without any degradation in performance. When replacing a sheaf i by the sheaf n, the set of labels must be transformed by T. For a given node U with label U,- u , , - ~ . . . ui ... U' -

T( U,- ,u,-2.. . ui.. .U&) b) Sheaf 1 Is Replaces by Sheaf n.

Figure 1. Link Replacement in FTH(3). U , , - ~ U , - ~ . . U ~ . . ~ , if ui=0 U,- '.. . ai+ ' ~ ~ a ~ - '.. .&, for ui = 1. (2- 1)

= [- Figure 1 shows the structure of mH(3) in which sheaf replaced sheaf 1.

has Figure 2 shows the link reliability VS Cube dimension for both standard and the FTH, and for p=O.99, and indicates the superiority of the FTH over the standard cube.

3. ANALYSIS OF COMMUNICATION LINK RELIABILITY IN FTH 4. SUBCUBE RELIABILITY OF THE FTH

The Standard n-CUbe is composed Of n sheaves each sheaf A fadt-free hypercube can be into suhuks that can be allocated to various tasks [8,9]. A faulty structure having 2"-' links. Thus - degrades gracefidly if it can be partitioned into subcubes of ade- quate size allowing the execution of various tasks. Intuitively, (3-1) the FTH should offer a better degradation in performance mm- pared to the n-cube, since the robustness of the FTH is improved

works to embed subcubes of various dimensions in a fault-free

On the other hand, all n sheaves must be operational in the stan-

(out of n+ 1) will do. dard hypercube in the ?I Operational sheaves due to c - w . Lemmas 1 & 2 quantify the ability of these net-

environment, (3-2a)

RF' = tzl) R:h (I-&) + R$+' Lemma I: The number of k-cubes in any n-cube is ( t ) 2"-&, where 0 s k I n. [The proof is in the appendix.] (3-2b)

LATIFI: FAULT-TOLERANT HYPERCUBE MULTIPROCESSORS 363

ncube : 0

FTH(n) : A

0 4 8 12 16 20 Cube Dimension (n)

Figure 2. Collective Link Reliability ( p =0.99).

Lemma 2: The number of k-cubes in any FTH (n) is (":I)

2n-k, where 0 5 k I n. [The proof is in the appendix.]

From lemmas 1 & 2, the number of k-cubes in the FTH ( n ) i sak = ( n + l ) / ( n - k + l ) timesthenumberofk-cubesinthe n-cube. Observe that a,, = n + 1. This implies that there exist (n + 1 ) n-cubes in the FTH (n). While all these n-cubes share the same set of nodes, they are distinguished by the n sheaves they possess. More formally, n-cube i contains all the dimen- sions < xo, x l , ..., x,, > except for the dimension xi , where 0 I i I n. Figure 3 shows all the existing 3-cubes in FTH(3). The character below each network representation is the absent dimension in the structure. In the FTH, the network performance does not degrade if any one link fails. For a node failure, degradation is unavoidable for there is no spare to replace the failed nodes. Observe that ak increases with k, eg, aYn=n+ 1 > a,, - = (n + 1 ) /2. The existence of an abundance of sub- cubes in the FTH allows this structure to degrade to a cube of lower dimension more gracefully than does the n-cube.

4.1 Subcube Communication Lmk Reliability

Under the link-failure model in section 3, the collective link reliability of an i-cube for both standard n-cube and FTH is to be analyzed. An operational i-cube contains at least i opera- tional sheaves. Thus -

(4- 1 a)

(4- 1 b) j = i \ J /

Eq (4- 1) yields conservative values for link reliability since in an i-cube not all 2"-' links have to be fault free in order to

100

110

000 001

101

111

010 01 1

(1 1 (2)

Figure 3. The FTH(3) and Its Embedded 3-cubes.

have an operational sheaf. Obtaining an exact value for the Rsh is extremely difficult due to the many possibilities that result in an operational sheaf. Examining (4-1) reveals the improve- ment of the subcube link reliability in the FTH network.

Next a failure model is applied to both n-cube and FTH, and the subcube reliability of these structures is compared.

4.2 An n-Cube Node-Failure Model

A single node failure in an n-cube destroys one dimension. Subsequent node failures can, but need not, destroy further dimensions - depending on the fault sites. In the worst case, the node at the opposite corner of the first faulty node fails, and 2 dimensions are lost. Given an arbitrary set of node failures, a fault free (n - 1)-cube exists if all the faulty nodes can fit in an i-cube, i<n.

The subcube reliability of an n-cube has been derived by Abraham & Padmana [ 101 using a node-failure model based on Markov chain. (The same model is applied to the FTH in sec- tion 4.3 to determine the subcube reliability). The derivation of the reliability as well as the mean time-to-failure [lo] are briefly reviewed.

Let Si be the system state in which all node failures in the cube are contained in an i-cube, but not in an (i- 1)-cube for 0 5 i 5 n. This state corresponds to the existence of exactly n - i disjoint subcubes of order n - 1, n - 2,. . ,i in the system.

364 IEEE TRANSACTIONS ON RELJABILJTY, VOL. 39, NO. 3, 1990 AUGUST

For example, the functional subcubes existing in an 8-cube are shown in figure 4, where a cube is represented as a rectangle; the set of faulty nodes is contained in a 4-cube and, therefore, the system is in state S4.

7

(Each box and each number represent a cube and its dimension

- - a p i ( t ) - - h ( N - 2 ' ) P , ( t ) at

i - 1

+ A 2j t;!) P j ( t ) , O<isn . (4-2~) j = O

The initial conditions are P G ( 0 ) = 1, andPi(0) = 0 for all i.

The solution to (4-2) is [lo]:

The cumulative probabilities are:

RG(t) = PG(t) (4-4a)

& ( t ) = p o ( t ) + RG(t) (4-4b)

(4-4c) R i ( t ) = P i ( t ) + R i - l ( t ) , O<i<n

respectively). From (4-4) a closed formula can be obtained for Ri ( t ) : Figure 4. Functional Subcubes Existing in an 8-Cube.

In state S, there exists no functional cube of size n - 1. This state is conventionally defined as the fatal case since no operational (n - 1 )-cube can be obtained in this state. State SG represents the initial fault-free state of the n-cube. All nodes have failure rate A.

The transition from Si to Si+j (0 < j 5 n - i) occurs when an additional node has failed outside the faulty i-cube; the new faulty cube is size i+j . To determine the transition rate, the n-cube can be envisaged as split across i dimensions such that each node in the original damaged i-cube is in a separate parti- tion [ 101. Each of the resultant 2' partitions is an (n - i)-cube containing exactly one node from the damaged i-cube. The new failure must be in one of these partitions, say A; and no path within partition A crosses any of the i dimensions used during the splitting process. Therefore, if the new failure A is distance j away from the node in A that belongs to the damaged i-cube, all faulty nodes in the system are contained in an ( i +j)-cube. There are ( ",Ti ) such nodes in each A, leading to a transition rate of h 2' ("7) [IO].

The state equations for the system are [lo]:

(4-5a)

R , ( t ) = Pk( t ) + RG(t ) , Osi<n . (4-5b) k=O

Thus, Ri ( t ) is the probability that all node failures (if any) up to time c are contained within an i-cube, leaving n - i func- tional subcubes of order n - 1,. . . ,i.

The & is:

Ti = Ri( t )d t , OSi<n. so 1 1

TG = x, To = TG + - ( N - 1)h

& = & - I + ( - l ) i + l t) & 2"-m

O<i<n.

(4-6b) m=O

(4-6a)

(4-2a) 4.3 Transition Rates for FTH The transition rates for the FTH can be obtained by deriv-

ing the related expressions for an FTH(3) and generalizing them to the case of FTH (n) . But first, a definition and a lemma are = - h ( N - I ) P O ( t ) + w p G ( t ) (4-2b)

at in order.

LATIFI: FAULT-TOLERANT HYPERCUBE MULTIPROCESSORS 365

DeBnition: Consider an n-cube and an integer k, 0 < k < n. This cube can be visualized as a k-cube whose vertices are each a cube of dimension (n - k) . The conversion of dimension from n to k in this manner is cube compression, and the resulting k-cube is the compressed cube. The nodes and links of the com- pressed cube are vertices and bundles, respectively.

Lemma 3: An FTH (n) can be compressed to an FTH (k) by replacing each (n -k)-cube by a single vertex, and the set of 2k links between any two adjacent vertices by a single bundle. [The proof is obvious.]

Lemma 4: Consider any transition Si - Si+j in FTH(n) for Os i<n . Then j~ l iub{(n- i ) /2} . [The proof is in the appendix.]

Now consider the transition SO - S' for a given FTH (n) . There are Nj nodes at distance j from the first failed node, where Nj can be obtained as follows [ll]:

= (;'?, for O<j<liub{n/2}

NnI2 = c::>, for even n

(4-7a)

(4-7b)

(4-7c)

Failure of any one of these nodes can result in the above transi- tion. Therefore, the transition rate is Wj. To derive the rates for i>O, consider lemma 5 .

Lemma 5: The rate for transition Si - S i + j in an FTH(n) is 2' times that for transition So - Sj in an FTH(n-i). [The proof is in the appendix.]

The transition rates for any FTH (n) can be obtained us- ing lemmas 4 and 5 . Figure 5 shows the state transition for both 3-cube and FTH(3). As an example, suppose nodes OOO and 001 in FTH(3) are faulty (see figure la). Thus, the network is in state S, since the faulty nodes can be embedded in a 1-cube. Subsequent failure of any one of the remaining nodes brings the network to state S2. Transition SI - S3 is impossi- ble according to lemma 4.

Table 1 summarizes the transition rates for the FTH (n) , 4 5 n I 8. As shown in table 1, it is not feasible to obtain a closed formula for the transition rates in general. Since the reliability and mean time-to-failure depend on the transition rates, no clos- ed formula can be obtained for them either. The Ri(t) for FTH(n) can be obtained by applying the linear Laplace transform to the transition matrix obtained from table 1, and following the same approach used for the n-cube. Figure 6 shows the cumulative probabilities for both the n-cube and FTH,

for various n, under the node-failure model. The T, (0 I i I n - 1 ) are listed in table 2 for both structures. The To is the same for both structures since it corresponds to the first node-failure in the network, and nodes have a constant failure rate. As the i-cube (which contains the faulty nodes) becomes larger, so does Ti improve in the FTH network. The Tn- l , in particular, are compared for both structures in table 3. The im- provement of T, - in FTH is at least 22 % for a wide range of n (4 5 n I 12). These results have been verified by Monte Carlo simulation [ 121.

3-cube

3 h

Figure 5 . State Transition under Node-Failure Model.

0 - 0

0

0) D as J i 0

(I)

0

2 i \ \

z oz 100 200 300 400 500 600

Time (Hrs.)

Figure 6. Cumulative Probabilities.

4.3 Underusage of the Nodes

An i-cube is faulty if the faulty nodes could fit in an i-cube but not in an ( i - 1 )-cube. However, this does not necessarily

366 IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 3, 1990 AUGUST

TABLE 1 Transition Rates for the Node Failure Model

Transition n = 3 n=4 n = 5 n = 6 n=7 n = 8

(9 (9 (9 (9

21 (!) 2 1 (;) 21 (!) 2 1 6) 22 6) 22 (:) 22 (:>

23 @) 23 6) 23 6) 24 (:) 24 (:>

25 6) 25 (;)

26 ( 7 ) 27 (:>

2 1 (3) 21 (:)

22 6) 22 (:) 22 6) 22 6)

23 6) 23 (;)

24 (:) 24 (:)

25 (:)

LATIFI: FAULT-TOLERANT HYPERCUBE MULTIPROCESSORS 367

TABLE 2 Subcube Mean Time-&Failure (A = 10-s/hour).

Now suppose: 1) an i-cube has been declared faulty, and 2) ~II 2' nodes of this subcube are faultv. ~n the worst case. the

cause this transition. n u s , TG TO T2 T3 T4 TS T6 Tl

6250 12917 14821 19107 28155 2"-' - (2 '+1) UUFs (4-8a)

2" 6250 12917 15298 22441 34941

3125 6351 6888 8194 10460 14982 3125 6351 6996 8955 12142 18392

1563 3150 3303 3726 4468 5611 7861 the most underused nodes (by lemma 4). Therefore, Similarly in FTH, the transition Si - Sr(n-1)/21 results in

1563 3150 3329 3920 5220 6579 9592

781 1569 1612 1750 2014 2400 2968 4094 U U F ~ = p i + r(n-0121 - ( 2 ' f 1 )]/2" (4-8b) 781 1569 1619 1802 2224 2750 3528 5091

(4-9)

TABLE 3 Improvement in Mean Time-to-Failure in FTH(n).

[The numbers in parenthesis are the simulation results.] = 2-"12, for n > > i .

It is easy to see that FTH uses the fault-free nodes much more A T ~ - ~ n T,-I Tn-1 (n-cube) F W (%) efficiently than does the n-cube.

4 28155 34941 24.1 (27718) (34200) (23.4) 4.4 Incorporation of Link Failures

14982 18392 (14153) (18531)

7861 9592

5

6 (7923) ( l o o m

(4234) (5555)

21 17 2663 (2288) (2860)

1090 1363 (1 190) (1438)

4094 5091 7

8

9

10

11

22.8 (30.9)

22.0 (26.4)

24.3 (31.1)

25.7 (25.0)

25.0 (20.8)

25.2 (26.9)

25.1 (27.4)

25.6 (28.8)

The node-failure model can be extended to incorporate the effect of link failures. More precisely, under the node-failure model, a link failure can be modeled as the failure of one of its terminal nodes. As suggested in [lo], a supemode can be defined as a node plus half its incident links (thus, each link exclusively belongs to one node). The failure of any of 1 + n / 2 components of the supemode causes it to fail. Let A,, and AI be the node and link failure rates respectively. The failure rate of each supemode is A = A,, + ( n / 2 ) A,. This A can then be applied to the node failure-model to obtain the reliability and mean time-to-failure.

The approximation is clearly conservative (an upper bound on actual failure rate) since in reality, not every link failure would result in the failure of one of its terminal nodes. However, as was shown in [lo], the accuracy of this approach consider- ing its simplicity is excellent.

APPENDIX

A . l Proof of Theorem 1. mean that all nodes of the faulty i-cube are faulty. Some of these nodes, even though fault free, are assumed don't care and re- main underused. To characterize this underusage of fault-free nodes for the n-cube and the FTH (n) , define an Underusage Factor:

UUF

Consider an n-cube whose nodes are assigned binary numbers from 0 to 2" - 1. A link is labeled i if it connects two nodes whose labels differ only in bit i . The n-cube can be visualized as two subcubes of dimension n - 1 joined by sheaf i ( O l i l n - 1 ) . Each node of one subcube is connected to a unique node in the other subcube via a link labeled i .

Let some links of sheaf i be broken (faulty). The n-cube can be split into two ( n - 1)cubes by removing all links of sheaf = [number of fault free nodes in the subcube (worst case)]/2"

368 IEEE TRANSACTIONS ON RELIABILITY, VOL. 39, NO. 3, 1990 AUGUST

i. Now introduce the set of c-links to the structure. Each link is incident on two nodes, say U and U ‘, whose labels differ in all bits including bit i. Therefore each of the 2”-’ c-links con- nects one node of one (n - 1 )-cube to a unique node of the other (n - 1)-cube. On the other hand, an n-cube (denoted by Q,)

duct operation x as follows [7], where K2 = Q, is the com- plete 2-node graph:

than those spanned by the faulty i-cube, are destroyed result- ing in a faulty ( i + j) -cube in the FTH (n) . This resembles the transition So - Sj caused by the failure of one vertex, j hops away from the first failed vertex in the FTH (n - i ) . But there are 2’ nodes in each vertex which can make that vertex faulty.

0 for n 1 2 can be defined recursively in terms of the graph pro-

REFERENCES Qn = K2 x Qn-1.

Since the 2”-’ c-links establish a one-to-one correspondence between each node of one (n - 1 )-cube and a unique node of the other ( n - 1 )-cube, the resultant network is the product of K2 and and thus is an n-cube. This n-cube has sheaf i replaced by sheaf n. 0

A.2 Proof of Lemma 1

Each k-cube can be represented by an n-tuple where (n - k) bits are fixed and the remaining k bits specify the dimensions spanned by the k-cube. For a given set of k bit positions, there exist 2n-k different ways to label the remaining (n-k) bits. On the other hand, there are ( E ) ways to choose k bits out of n bits. o A.3 Proof of Lemma 2

By theorem 1, the set of spare links or sheaf n can replace any set of links in a given dimension. This means that in the FTH, there exist (n + 1 ) dimensions from which one can choose

0 the k dimensions required to construct a k-cube.

A.4 Proof of Lemma 4

Proof is by contradiction. Assumej > liub{n - i / 2 } , and compress FTH (n) to obtain an FTH (n - i) . Clearly, prior to the transition, one of the vertices in the (n - +cube is faulty (this corresponds to the faulty i-cube in the original cube). Therefore, the transition Si - Si+j in the n-cube is similar to the transition So - Si in the (n - i) -cube (failure of one node of the n-cube is sufficient to make the vertex containing it faul- ty). But the latter transition must have occurred due to the failure of a vertex, j hops away from the first failed vertex. This is impossible since the diameter of the FTH(n-i) is at most liub {n - i / 2 } . 0

A.5 Proof of Lemma 5

The FTH (n) can be compressed to form an FTH (n - i) . State Si (a faulty i-cube) in the FTH (n) corresponds to state So (a faulty vertex) in FTH(n-i). The transition Si - Si+j occurs if a node fails in such a way that j dimensions, other

[I] Y. Saad, M. H. Schultz, “Topological pruperties of hypembes”, Research Rep., YALEUIDCSIRR-389, 1985.

[2] J. R. Armstrong, F. G. Gray, “Fault diagnosis in a Boolean n-cube ar- ray of multiprocessors”, ZEEE Trans. Computers, vol c-30, 1981 Aug,

[3] M. S. Krishnamoorthy, B. Krishnammorthy, “Fault diameter of inter- connection networks”, Comput. Math. Applications, vol 13, nun 5/6,

[4] H. J. Siege], “Interconnection networks for scale parallel processing”, Theory and Case Studies, Lexington, 1985.

[5] F. P. Preparata, J. Vuillemin, “The cube-connected cycles: A versatile network for parallel computation”, C o r n . ACM, vol24, 1981 May, pp 300-309.

[6] B. Becker, H. U. Simon, “How robust is the n-cube?”, ZEEE2fh Ann. Symp. Foundation of Computer Science, 1986, pp 283-291.

[7] F. Hatary, Graph Iheory, Addison Wesley, 1969. [8] M. S. Chen, K. G. Shin, “Embedment of interacting task modules into

a hypercube multiprocessor”, Proc. Second Hypercube Conf., 1986 Oct,

[9] M. S. Chen, K. G. Shin, “Processor allocation in an n-cube multipmsor using gray codes”, ZEEE Trans. Computers, vol C-36, 1987 Dec, pp

[lo] S. Abraham, K. Padmanabhan, “Reliability of the hypercube multicom- puters”, P m . 19881nt’l. a n $ Parallel Processing, 1988 Aug, pp90-93.

[ l l ] S. Latifi, A. El-Amawy, “On folded hypercubes”, Proc. 1989Znt’l C o f Parallel Processing, vol I, 1989, pp 180-187.

[I21 S. Latifi, “Hypercube-based topologies with incremental link redundan- cy”, PhD Dissertntion, 1989 Aug.

pp 587-590.

1987, p~ 577-582.

pp 121-129.

1396- 1407.

AUTHOR

Dr. Shahram Latifi, Assistant Professor; Electrical Engineehg Dept.; University of Nevada; 4505 Maryland Parkway; Las Vegas, Nevada 89154 USA.

Shahram Latifi (S ’88, M ’89) received the MS in Electrical Engineering from the University of Teheran, Iran in 1980. He received the MS and PhD, both in Electrical & Computer Engineering from Louisiana State University, Baton Rouge in 1986 and 1989. He is an Assistant Professor of Electrical EnguKering at University of Nevada at Las Vegas. From 1980 to 1984 he worked as a Technical Manager in Khazar Company, Iran. From 1984 to 1989, he held a teaching assistantship at LSU. His research interests include computer ar- chitecture, fault-tolerant computing, parallel processing, and hypercube net- works. Dr. Latifi is a member of IEEE Computer Society.

Manuscript TR89-108 received 1989 July 21; revised 1989 October 10.

IEEE Log Number 33645 4TRb