Copyright © 1971 American Telephone and Telegraph Company THE BELL SYSTEM TECHNICAL JOURNAL Vol. 50, No. 5, May-June, 1971 Printed in U.S.A. ## On a Class of Rearrangeable Switching Networks # Part I: Control Algorithm By D. C. OPFERMAN and N. T. TSAO-WU (Manuscript received December 1, 1970) An algorithm is developed to control a class of rearrangeable switching networks, particularly with the base-2 structure. Various methods of implementing this algorithm are also described. System organization and processing time for rearranging the network are studied and are shown to be practical. #### I. INTRODUCTION One type of a switching network which has drawn considerable interest lately is the class of rearrangeable switching networks (RSN). With these networks, any idle input terminal of the network can always be connected to any idle output terminal by rerouting the existing connections if necessary. These networks can be used where one-to-one full access and nonblocking features are required, and rerouting is feasible, e.g., main distribution frames<sup>1</sup> and facility switches<sup>2</sup> in telephone systems and data transfer networks in a multiprocessor computer system.<sup>3</sup> Most of the earlier efforts, notably by C. Clos, V. E. Beneš, and A. E. Joel, Jr., have been made in the context of telephone switching networks. Their emphasis has been on the network structure, on its combinatorial properties and on bounds on the number of connections that require rerouting. Recently, this type of network has been of interest in such computer areas as data-sorting systems and self-repairing multiprocessors. The network structure is also applicable for cellular arrays. However, very few reports have been made on the control aspect of these networks. This paper will begin with a brief discussion of the general structure of RSN's, followed by the development of a method for the control of these networks and its practical implementation. The relationship between the network structure and the ease (or difficulty) with which it can be controlled will also be discussed. #### II. THE NETWORK STRUCTURE Discussion will be limited to a class of rearrangeable switching networks connecting N input terminals and N output terminals [abbreviated as $(N \times N)$ networks]. Extension to the more general case for $(N \times M)$ networks, $N \neq M$ , can be readily made. Let N=dq, where d and q are integer factors of N. A $(N\times N)$ network can be decomposed into an input stage and an output stage having altogether 2N/d $(d\times d)$ networks (one of which can be eliminated) and a middle stage having d $(N/d\times N/d)$ networks, as shown in Fig. 1. This network is said to have a base-d structure, and the smaller networks are called subnetworks. This type of network structure falls into the general class considered by Clos and Beneš. The network with base-2 structure is of great importance, for two primary reasons. First, it yields the most efficient network (or the least number of two-state switching elements) and, secondly, its control is relatively simple. It consists of $(2 \times 2)$ networks in the input and output stages, altogether (N-1) in number where N is even or odd. Two $(N/2 \times N/2)$ networks are in the middle stage if N is even, or one $[(N-1)/2 \times (N-1)/2]$ network and one $[(N+1)/2 \times (N+1)/2]$ network are in the middle stage if N is odd, as shown in Fig. 2a and b. Clearly, further decomposition of the networks in the middle stage is possible, and if the base-2 structure is carried throughout, one may show by an iterative process that the total number of basic switch- Fig. 1—Rearrangeable $(N \times N)$ network of a general base-d structure. Fig. 2—An $(N \times N)$ network with base-2 structure for: (a) N even; (b) N odd. ing elements or $(2 \times 2)$ networks, named $\beta$ -elements by Joel, is given by \* $$N\langle \log_2 N \rangle - 2^{\langle \log_2 N \rangle} + 1.$$ It can easily be seen that the number of these $\beta$ -elements is bounded by $\langle \log_2 N! \rangle$ . A $(11 \times 11)$ network consisting entirely of $\beta$ -elements is shown in Fig. 3. It is a very efficient network, since there are 29 such elements and one must have $26 \ (>\log_2(11!)>25)$ two-state devices to accommodate all possible permutations. Some of the enumeration studies given in Part II will account for the additional states. ### III. THE NETWORK CONTROL The control algorithm is first developed for the general $(N \times N)$ network, having a base-d structure, as it is shown in Fig. 1. The special case where d=2 will then be considered, and practical implementations of the control algorithm will be given in Section IV. First, some definitions are needed. <sup>\*</sup> $\langle z \rangle$ is the smallest integer greater than z. Fig. 3—An $(11 \times 11)$ network with base-2 structure. ### 3.1 Definitions (i) An input-output pair $\binom{x}{\pi(x)}$ defines a connection between input terminal x and output terminal $\pi(x)$ of a $(N \times N)$ network, where $1 \le x \le N$ , $1 \le \pi(x) \le N$ (ii) A connection set C is a collection of m such input-output pairs, $m \leq N$ , expressed as follows: $$C = \begin{bmatrix} x_1 & x_2 & \cdots & x_m \\ \pi(x_1) & \pi(x_2) & \cdots & \pi(x_m) \end{bmatrix}.$$ If m = N, C is then in the form of the familiar permutation. This may be denoted by P, with $C \subseteq P$ , describing all the connections (or the traffic pattern) through the $(N \times N)$ network. (iii) Let $I = \{1, 2, \dots, N\}$ be a set of positive distinct integers; a subset J(l, d) is defined by $$J(l, d) = \left\{ a \mid \left[ \frac{a+d-1}{d} \right]^* = l \right\}, \text{ and } a \in I$$ where d divides N, and l is some constant integer, $1 \le l \le N/d$ . J(l, d) is called an integer set of order d and of characteristic l, and any element belonging to it is said to have characteristic l relative to the base-d. This is merely a formal way of grouping all those terminals associated with the same subnetwork in the input (or output) stage. (iv) A connection set C having m input-output pairs is said to be <sup>\* [</sup>w] denotes the integral value of w. reducible if and only if on replacing every integer by its characteristic, C becomes a permutation on m distinct integers. ### 3.2 The Generalized Control Algorithm Any given set of connections through the network can be described as: $$P = \begin{pmatrix} x_1 & x_2 & \cdots & x_N \\ \pi(x_1) & \pi(x_2) & \cdots & \pi(x_N) \end{pmatrix}.$$ The objective of the control algorithm is to derive from P permutations to be realized by each of the subnetworks. This is accomplished by first decomposing P into reducible connection sets $C_1$ , $C_2$ , $\cdots$ , $C_d$ . Permutations for the middle subnetworks are defined by using the characteristics of the elements in these sets, and permutations for the input and output stages are determined directly from these elements. 3.2.1 To Decompose a Permutation into Reducible Connection Sets Let the output terminals be partitioned into sets denoted by $S_l$ where $$S_{t} = \{\pi(x_{i}) \mid x_{i} \in J(l, d)\}, \qquad 1 \leq l \leq \frac{N}{d}.$$ The reducible connection sets $C_i$ , $1 \le i \le d$ , are constructed by grouping N/d input-output pairs of which the output integers are selected, one from each $S_l$ , such that no two output integers have the same characteristic. Let $C_i$ be expressed as $$C_{i} = \begin{bmatrix} x_{i,1} & x_{i,2} & \cdots & x_{i,N/d} \\ \pi(x_{i,1}) & \pi(x_{i,2}) & \cdots & \pi(x_{i,N/d}) \end{bmatrix}, \qquad 1 \leq i \leq d;$$ when it is reduced, one has the permutation $$P_{i} = \begin{pmatrix} p_{i,1} & p_{i,2} & \cdots & p_{i,N/d} \\ \pi(p_{i,1}) & \pi(p_{i,2}) & & \pi(p_{i,N/d}) \end{pmatrix}$$ where $p_{i,j}$ and $\pi(p_{i,j})$ are the characteristics of the integers $x_{i,j}$ and $\pi(x_{i,j})$ , respectively, $1 \le i \le d$ , $1 \le j \le N/d$ . Each of these d permutations $P_i$ can be realized by any of the d $(N/d \times N/d)$ subnetworks. However, if each $C_i$ is assigned to a particular $(N/d \times N/d)$ subnetwork, one of the $(d \times d)$ subnetworks, say the one at the upper-left corner of the input stage (Fig. 1), can be eliminated. This elimination implies that the connection set $C_i$ , and hence $P_i$ , is assigned to the first $(N/d \times N/d)$ subnetwork in the middle stage if and only if $x_{i,j} = 1$ for some $j, 1 \le j \le N/d$ . In general, $C_i$ , and hence $P_i$ , is assigned to the $k^{\text{th}}$ $(N/d \times N/d)$ subnetwork in the middle stage, $1 \le k \le d$ , (counting from the top) if and only if $x_{i,j} = k$ for some $j, 1 \le j \le N/d$ . One may then reorder the indices i such that $C_i$ contains the input-output pair $\binom{x_{i,j}=i}{\pi(x_{i,j})}$ , and it follows that the permutations $P_1, P_2, \cdots, P_d$ are similarly ordered for the d $(N/d \times N/d)$ subnetworks in the middle stage. 3.2.2 To Obtain Permutations for Subnetworks in the Input and Output Stages Let $P_{I,1}$ , $P_{I,2}$ , $\cdots$ , $P_{I,N/d}$ and $P_{0,1}$ , $P_{0,2}$ , $\cdots$ , $P_{0,N/d}$ be the sets of permutations to be realized by the 1<sup>st</sup>, 2<sup>nd</sup>, $\cdots$ , (N/d)<sup>th</sup> $(d \times d)$ subnetworks in the input and output stage, respectively. Moreover, let the reducible connection set $C_i$ be written as $$C_{i} = \begin{bmatrix} x_{i,1} & x_{i,2} & \cdots & x_{i,N/d} \\ \pi(x_{i,1}) & \pi(x_{i,2}) & \cdots & \pi(x_{i,N/d}) \end{bmatrix}, \quad 1 \leq i \leq d$$ such that $x_{i, 1} < x_{i, 2} \cdots < \cdots < x_{i, N/d}$ . Then, from the network structure, one can see simply that the permutations $$P_{I,i} = \begin{pmatrix} \pi_i^{-1}(1) & \pi_i^{-1}(2) & \cdots & \pi_i^{-1}(d) \\ 1 & 2 & \cdots & d \end{pmatrix}$$ and $$P_{0,k} = \begin{pmatrix} 1 & 2 & \cdots & d \\ \pi_k(1) & \pi_k(2) & \cdots & \pi_k(d) \end{pmatrix}$$ can be obtained from $C_i$ with $$\pi_i^{-1}(a) = x_{a,i} - (j-1)d, \quad 1 \le j \le N/d, \quad 1 \le a \le d$$ where $\pi^{-1}(a)$ denotes the input terminal to be connected to the output terminal a, and $\pi_k(a) = \pi(x_{a,t}) - (k-1) d$ , $1 \le k \le N/d$ , for some t such that $\pi(x_{a,t}) \subset J(k,d)$ . 3.2.3 An Example on Decomposing a Permutation Consider a $(15 \times 15)$ network, having a base d = 5 structure. Such a network is shown in Fig. 4, with two $(5 \times 5)$ subnetworks in the input stage, three $(5 \times 5)$ subnetworks in the output stage, and five Fig. 4—A $(15 \times 15)$ network with permutations assignment. $(3 \times 3)$ subnetworks in the middle stage. Let the connections through the network be described by the following permutation: $$P = \begin{pmatrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\ 11 & 15 & 4 & 2 & 6 & 1 & 7 & 5 & 8 & 9 & 12 & 14 & 3 & 13 & 10 \end{pmatrix}.$$ The output integers are partitioned as follows: $$S_1 = \{11, 15, 4, 2, 6\},$$ $S_2 = \{1, 7, 5, 8, 9\},$ $S_3 = \{12, 14, 3, 13, 10\}.$ From these, the reducible connection sets $C_1$ , $C_2$ , $C_3$ , $C_4$ , and $C_5$ are constructed as shown in Table I and, in general, they are not unique. The corresponding permutations $(P_i)$ , which are also shown in Table I, are obtained from $C_i$ by replacing each element with its characteristic. Note that connection sets are ordered according to the input integers (1, 2, 3, 4, 5) and that $x_{i,1} < x_{i,2} < x_{i,3}$ for each i. Table II shows the permutations derived from $C_i$ for subnetworks in the input and output stages using the relations for $\pi_i^{-1}(a)$ and $\pi_k(a)$ as given in Section 3.2.2. TABLE I—REDUCED CONNECTION SETS AND THEIR CORRESPONDING PERMITATIONS | i | $C_i$ $P_i$ | | |---|---------------------------------------------------------------------------------------------------------------------------------------------------------|---| | 1 | | ) | | 2 | $ \begin{pmatrix} 2 & 9 & 13 \\ 15 & 8 & 3 \end{pmatrix} \begin{pmatrix} 1 & 2 & 3 \\ 3 & 2 & 1 \end{pmatrix} $ | ) | | 3 | | ) | | 4 | | ) | | 5 | | | TABLE II—PERMUTATIONS FOR SUBNETWORKS IN THE INPUT AND OUTPUT STAGES | $\boldsymbol{j}$ | $\mathbf{P}_{I,j}$ | $P_{O,j}$ | | | | | | | |------------------|--------------------|------------------------------------------------------------------------|--|--|--|--|--|--| | 1 | | $\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 5 & 3 & 4 & 2 & 1 \end{pmatrix}$ | | | | | | | | 2 | | $\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 5 & 3 & 4 & 2 & 1 \end{pmatrix}$ | | | | | | | | , 3 | | $\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 1 & 5 & 2 & 4 & 3 \end{pmatrix}$ | | | | | | | #### IV. THE CONTROL ALGORITHM FOR THE BASE-2 NETWORK The selection of output integers from the set $S_l$ , $1 \le l \le N/d$ , to form the connection sets $C_i$ , $1 \le i \le d$ , is by no means simple. One procedure has been reported by V. I. Neiman.<sup>10</sup> By modifying the previous example, it can be shown that the selection can *not* be made on a strictly sequential basis. Let the given permutation be modified such that the sets $S_1$ , $S_2$ , and $S_3$ are as follows: $$S_1 = \{11, 10, 4, 2, 6\},$$ $S_2 = \{1, 7, 5, 8, 9\},$ $S_3 = \{12, 14, 3, 13, 15\}.$ If one had chosen $\binom{1}{11}$ and $\binom{6}{1}$ input-output pairs to form $C_1$ , where the output integers 11 and 1 are from $S_1$ and $S_2$ respectively, there is no output integer in $S_3$ which would have a characteristic different from that of the integer 11 or 1. Thus, in general, simultaneous selections must be made in the construction of the connection sets. For networks with base-2 structure, however, the selection is reduced to a mere binary choice, resulting in a much simpler algorithm. For the other extreme case, where d=N/2, the difficulty described above will not arise because there are only two sets $S_1$ and $S_2$ , each containing exactly d integers. For a $(N \times N)$ network with a base-2 structure, the control algorithm for setting the $\beta$ -elements to realize a given permutation P consists of three parts: (i) decomposing of P into reducible connection sets $C_1$ and $C_2$ ; (ii) reducing $C_1$ and $C_2$ to $P_1$ and $P_2$ respectively; and (iii) setting the $\beta$ -elements in the input and output stages. Since the network has an iterative structure, the same procedure is applied to each of the $(N/2 \times N/2)$ , $(N/4 \times N/4)$ , $\cdots$ , $(2 \times 2)$ subnetworks. There are $\log_2 N$ levels of an $(N \times N)$ network with the last level of $(2 \times 2)$ subnetworks being a trivial case, assuming N is a power of 2. A coding scheme for the input and output integers which facilitates the required operations and two methods for decomposing P are described in the following sections. ### 4.1 Coding Scheme It is clear from what has been discussed so far that the control algorithm essentially accepts the connection requirements as input data and, after processing, generates a set of output data which are used to rearrange the network. It is necessary, therefore, to have an input/output (I/O) memory, which stores the output terminal $\pi(x_i)$ at the address determined by the numerical value of the input terminal $x_i$ . A simple coding scheme for these integers proves to simplify the implementation of the algorithm. Referring again to the network, one can, of course, use the set of integers $(0, 1, 2, \dots, N-1)$ to number the input and output ter- minals, without loss of generality in all the previous discussions.\* Then, the familiar binary code can be used directly, both for the input $x_i$ as address and for the output $\pi(x_i)$ as the contents at $x_i$ . We shall now show how this code can be used at all $\log_2 N$ levels. Let the binary representation be $$b_{n-1}b_{n-2}\cdots b_1b_0,$$ where $n=\log_2 N$ , and assuming N a power of 2. Beginning at the first level of the network, the least significant bit of the address is used to set the input $\beta$ -element defined by the remaining n-1 bits, and that of the contents to set the output $\beta$ -element defined by the remaining n-1 bits of the contents. Moreover, the conversion from $C_1$ (or $C_2$ ) to $P_1$ (or $P_2$ ) is accomplished by merely eliminating $b_0$ for each coded output integer. Finally, for an output integer $\pi(x_i)$ , the bit $b_0$ is set to identify whether the particular $\pi(x_i)$ belongs to $P_1$ or $P_2$ . However, for an input integer, the most significant bit of the address, $b_{n-1}$ , indicates whether $x_i$ belongs to $P_1$ or $P_2$ . The same coding procedure is applied at each subnetwork, and the I/O memory is partitioned (part of algorithm) in the appropriate manner. In general, at the $i^{\text{th}}$ level of the network, $i < \log_2 N$ , the (i-1) least significant bits of a word in memory define the particular subnetwork of size $N(2^{1-i})$ , and the $\log_2 N - (i-1)$ most significant bits define the output integer. The (i-1) most significant bits, however, of the address designate the subnetwork, and the remaining bits define the input integer. An example will be given in detail to illustrate this in Section 4.2.1. ### 4.2 Decomposition by Looping With d=2, an integer set is reduced to an integer pair, consisting of only two elements, and one is said to be the dual of the other. If one continues to use the integers $(0, 1, \dots, N-1)$ to number the input and output terminals of an $(N \times N)$ network, then the integers a and b constitute an integer pair if $$\left[\frac{a}{2}\right] = \left[\frac{b}{2}\right] = l$$ for some integer l. The dual of a (or b) is denoted by $\hat{a}$ (or $\hat{b}$ ), and, $$J(l, d) = \{a | [a/d] = l\}, \quad 0 \le l \le (N/d) - 1$$ <sup>\*</sup>Except that the definition of the integer set J(l, d) needs to be slightly modified, as follows: therefore, $$\hat{a} = b$$ and $\hat{b} = a$ . Moreover, any permutation is decomposed into only two reducible connection sets $C_1$ and $C_2$ , and if $\begin{bmatrix} x \\ \pi(x) \end{bmatrix} \in C_1$ , then $\begin{bmatrix} x \\ \pi(x) \end{bmatrix}$ and $\begin{bmatrix} x \\ \pi(x) \end{bmatrix} \in C_2$ for some $x_i$ , where $\pi(x_i)$ is the dual of $\pi(x)$ . In coded form, the dual is obtained merely by complementing the least significant bit. One method, called the looping procedure, of constructing $C_1$ and $C_2$ from P is to search the I/O memory for the required outputs. The sequence starts by selecting the output $\pi(0)$ in the first location of the memory (or $\binom{0}{\pi(0)}$ ). The first (n-1) bits $(a_{n-1}a_{n-2}\cdots a_1)$ of the address which are all zeros define the first $\beta$ -element $(\beta_{i1})$ in the input stage, and the last bit $(a_0)$ which is also zero defines $\beta_{i1}$ to be set to the "straight through" state (see Fig. 5). The bits $m_{n-1}m_{n-2}\cdots m_1$ and $m_0$ at this address define one of the $\beta$ -elements $(\beta_{oi})$ in the output stage and its setting respectively. Bit $m_0$ is now reset to zero to designate that this particular output $(m_{n-1}m_{n-2}\cdots m_1)$ is for $P_1$ . There is also another bit $m_n$ which is set to one when that particular output has been placed in $P_1$ or $P_2$ so that an unused output can be selected when it is necessary. The example in Section 4.2.1 will clarify this. The memory is next scanned for the dual of $\pi(0)$ ; this output and its address $x_k$ define $\binom{x_k}{\pi(x_k)}$ for $C_2$ . For the input-output pairs in $C_2$ Fig. 5—Base-2 $(N \times N)$ rearrangeable switching network. it is only necessary to set $m_n$ and $m_0$ of the memory word to '1' to signify that this word (output) has been used and that it is for $P_2$ . The output $\pi(\hat{x}_k)$ at the address $\hat{x}_k$ defines $\begin{bmatrix} \hat{x}_k \\ \tau(\hat{x}_k) \end{bmatrix}$ and is designated for $C_1$ . The same looping procedure is continued until all $\pi(x_i)$ , $1 \leq i \leq N$ are assigned to either $C_1$ or $C_2$ . In most cases, however, the looping will end before all output integers are used. The procedure is started again by arbitrarily selecting an unassigned output for $P_1$ , by examining $m_n$ . Because of this arbitrariness, $C_1$ is, in general, not unique. This fact is used to advantage in the other procedure to be described in Section 4.3. After the looping procedure is completed, the memory is reorganized to have $P_1$ and $P_2$ in locations 0 to N/2-1 and N/2 to N-1, respectively. (A small scratch pad memory may be necessary.) Then the same procedure is applied to each of these $(N/2 \times N/2)$ subnetworks, and it is continued until all of the $\beta$ -elements in the $(N \times N)$ network are set. The searching can be eliminated by employing two memories, one of which is the I/O memory. The additional one is an output/input (O/I) memory that stores the input $x_i$ at the address corresponding to the numerical value of the output $\pi(x_i)$ . The decomposition of P into $P_1$ and $P_2$ is achieved by crisscrossing between the two memories. For example, output $\pi(x_i)$ and its corresponding address $x_i$ in the I/O memory define $\binom{\pi^i}{\pi(x_i)}$ for $C_1$ . Now the dual of $\pi(x_i)$ , say $\pi(x_i)$ , is the address for the O/I memory and the corresponding word $x_i$ defines $\binom{\pi^i}{\pi(x_i)}$ for $C_2$ . Then $\hat{x}_i$ is used for the address in the I/O memory. If the I/O memory is a content addressable memory (CAM), 12 the required $\pi(x_i)$ 's for $C_1$ and $C_2$ are determined directly in the content addressable mode, without the use of a second memory. ### 4.2.1 An Example of the Looping Procedure In order to illustrate in a meaningful way the looping procedure using the coding scheme at various levels, a permutation for a $(32 \times 32)$ network, as given in Table III, will be utilized. The looping procedure begins with $\pi(0)=14$ and continues to select input-output pairs for the connection set $C_1$ . (The sequence of this selection is indicated by the number in the "looping sequence" column.) As each output integer is selected for $C_1$ , the last bit $m_0$ is used to set the output $\beta$ -element designated by $(m_4m_3m_2m_1)$ (see Fig. 6). Then the bit $m_0$ is set to '0' to indicate that it is for $C_1$ , and the bit $m_5$ is set to '1' to indicate that it has been used. The bits $m_0$ and $m_5$ of the output integers for $C_2$ are merely set to '1'. In this particular | Input<br>Terminals | Outputs<br>Terminals | | Coded Output Integers $m_5$ $m_4$ $m_3$ $m_2$ $m_1$ $m_0$ | | | | | Looping<br>Sequence | |--------------------------------------|----------------------|-------|-----------------------------------------------------------|------|------|------|------|---------------------| | rermmais | Terminais | -1165 | 1164 | 1163 | 1162 | 7701 | 7700 | - Dequence | | 0 | 14 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | | | 23 | 0 | 1 | 0 | 1 | 1 | 1 | | | 1<br>2<br>3<br>4<br>5<br>6<br>7<br>8 | 3 | 0 | 0 | 0 | 0 | 1 | 1 | 7 | | 3 | 20 | 0 | 1 | 0 | 1 | 0 | 0 | | | 4 | 28 | 0 | 1 | 1 | 1 | 0 | 0 | 8 | | 5 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | | | 6 | 16 | 0 | 1 | 0 | 0 | 0 | 0 | | | 7 | 11 | 0 | 0 | 1 | 0 | 1 | 1 | | | 8 | 31 | 0 | 1 | 1 | 1 | 1 | 1 | 9 | | 9 | 29 | 0 | 1 | 1 | 1 | 0 | 1 | | | 10 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | | | 11 | 27 | 0 | 1 | 1 | 0 | 1 | 1 | 5<br>3 | | $\overline{12}$ | 12 | 0 | 0 | 1 | 1 | 0 | 0 | 3 | | 13 | 18 | 0 | 1 | 0 | 0 | 1 | 0 | | | $\overline{14}$ | 5 | 0 | 0 | 0 | 1 | 0 | 1 | | | $\overline{15}$ | 24 | 0 | 1 | 1 | 0 | 0 | 0 | 11 | | 16 | 26 | 0 | 1 | 1 | 0 | 1 | 0 | | | 17 | 21 | 0 | 1 | 0 | 1 | 0 | 1 | 6 | | 18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | | 19 | 13 | 0 | 0 | 1 | 1 | 0 | 1 | | | 20 | 8 | 0 | 0 | 1 | 0 | 0 | 0 | | | 21 | 6 | 0 | 0 | 0 | 1 | 1 | 0 | 13 | | $\overline{22}$ | 19 | 0 | 1 | 0 | 0 | 1 | 1 | 2 | | $\overline{23}$ | 15 | 0 | 0 | 1 | 1 | 1 | 1 | | | $ar{24}$ | 4 | 0 | 0 | 0 | 1 | 0 | 0 | 10 | | $\overline{25}$ | 30 | 0 | 1 | 1 | 1 | 1 | 0 | | | 26 | 7 | 0 | 0 | 0 | 1 | 1 | 1 | | | 27 | 22 | 0 | 1 | 0 | 1 | 1 | 0 | 14 | | $\overline{28}$ | 17 | 0 | 1 | 0 | 0 | 0 | 1 | | | 29 | 10 | 0 | 0 | 1 | 0 | 1 | 0 | | | 30 | 25 | 0 | 1 | 1 | 0 | 0 | 1 | | | 31 | 9 | 0 | 0 | 1 | 0 | 0 | 1 | 12 | | | | | | | | | | | Table III—Memory Contents on the Interconnections example, the looping procedure ends prematurely and leaves the connection set $$\begin{bmatrix} 6 & 7 & 28 & 29 \\ 16 & 11 & 17 & 10 \end{bmatrix}$$ as indicated on Table IV. One then repeats the looping procedure by starting arbitrarily at some remaining output integers (as indicated by $m_5 = 0$ ). After every output integer has been assigned to $C_1$ (or $C_2$ ), one rearranges the memory such that all the integers with $m_0 = 0$ occupy the upper half of the memory, corresponding to the connections through the upper network, with the remaining output integers for the lower network. This is shown in Table V. Fig. 6—A (32 $\times$ 32) rearrangeable network with partial setting of $\beta$ -elements. Table VI shows the memory contents after another looping procedure is applied to the output integers (for both the upper and lower networks) and subsequent rearranging. At this level, $m_1m_0$ denotes the $(8 \times 8)$ subnetwork, and if one reverses the order to $m_0m_1$ , it is just the binary representation of natural ascending numbers, 0 being the upmost $(8 \times 8)$ subnetwork and 3 being the lowest $(8 \times 8)$ subnetwork. One could also obtain the same information from the addresses, since the contents are rearranged into this order. ### 4.3 Decomposition with a Trial-Partition Procedure A second method for decomposing P which incorporates a trial-anderror procedure is presented. Although this method is practical only for small networks, it is very important because of the simple implementation, and intelligence can be included to reduce the average processing time by taking advantage of the fact that the decomposition may not be unique. From the derivation of the reducible connection sets $C_1$ and $C_2$ , it is seen that they must contain one and only one $\pi(x_i)$ from each $S_l$ , $0 \le l \le (N/2) - 1$ , and the corresponding input $x_i$ . Since $\binom{0}{\tau(0)}$ and $\binom{1}{\tau(1)}$ are defined to be in $C_1$ and $C_2$ , respectively, only (N/2-1) additional input-output pairs must be selected for $C_1$ ; the remaining pairs are for $C_2$ . After $C_1$ and $C_2$ are determined, $P_1$ and $P_2$ and the rearranging of the I/O memory are derived in the same manner as in the looping procedure. Let $\Psi$ be the set of connection sets; each includes (N/2-1) input-output pairs formed by having one $\pi(x_i)$ from TABLE IV—MEMORY CONTENTS AFTER SEQUENCING THROUGH ONE LOOP | $m_5$ | $m_4$ | $m_3$ | $m_2$ | $m_1$ | $m_0$ | | | | | | | | |-------|-------|-------|-------|-------|-------|---------------|------------------|-----|-------------|-------|----|---| | 1 | 0 | 1 | 1 | 1 | 0 | | | | | | | | | 1 | 1 | Ô | i | î | 1 | | | | | | | | | 1 | ō | ŏ | Ô | î | ō | | | | | | | | | 1 | 1 | ŏ | ĭ | ō | ĭ | | $^{\mathrm{th}}$ | ese | bel | ong | | | | î | ĩ | ĭ | 1 | 0 | 0 | | to | and | $_{ m oth}$ | er lo | ор | | | î | Õ | ō | 0 | 1 | 1 | | _ | | | | | _ | | Õ | ĭ | Ō | 0 | 0 | 0 | $\rightarrow$ | ĺ | 1 | 0 | 0 | 0 | Ò | | ő | Ō | 1 | 0 | 1 | 1 | $\rightarrow$ | 1 | 0 | 1 | 0 | 1 | 1 | | 1 | 1 | 1 | 1 | 1 | 0 | | | | | | | | | 1 | 1 | 1 | 1 | 0 | 1 | | | | | | | | | 1 | 0 | 0 | 0 | 0 | 1 | | | | | | | | | 1 | 1 | 1 | 0 | 1 | 0 | | | | | | | | | 1 | 0 | 1 | 1 | 0 | 0 | | | | | | | | | 1 | 1 | 0 | 0 | 1 | 1 | | | | | | | | | 1 | 0 | 0 | 1 | 0 | 1 | | | | | | | | | 1 | 1 | 1 | 0 | 0 | 0 | | | | | | | | | 1 | 1 | 1 | 0 | 1 | 1 | | | | | | | | | 1 | 1 | 0 | 1 | 0 | 0 | | | | | | | | | 1 | 0 | 0 | 0 | 0 | 0 | | | | | | | | | 1 | 0 | 1 | 1 | 0 | 1 | | | | | | | | | 1 | 0 | 1 | 0 | 0 | 1 | | | | | | | | | 1 | 0 | 0 | 1 | 1 | 0 | | | | | | | | | 1 | 1 | 0 | 0 | 1 | 0 | | | | | | | | | 1 | 0 | 1 | 1 | 1 | 1 | | | | | | | | | 1 | 0 | 0 | 1 | 0 | 0 | | | | | | | | | 1 | 1 | 1 | 1 | 1 | 1 | | | | | | | | | 1 | 0 | 0 | 1 | 1 | 1 | | | | | | | | | 1 | 1 | 0 | 1 | 1 | 0 | | | _ | | | | | | 0 | 1 | 0 | 0 | 0 | 1 | <b>→</b> | 1 | 1 | 0 | 0 | 0 | 1 | | 0 | 0 | 1 | 0 | 1 | 0 | <b>→</b> | 1 | 0 | 1 | 0 | 1 | 0 | | 1 | 1 | 1 | 0 | 0 | 1 | | | | | | | | | 1 | 0 | 1 | 0 | 0 | 0 | | | | | | | | | | | | | | | | | | | | | | | ١ | | | | | | | ¬ | |---|-------|-------|-------|-------|-------|-------|---------------------| | | $m_5$ | $m_4$ | $m_3$ | $m_2$ | $m_1$ | $m_0$ | | | | 0 | 0 | 1 | 1 | 1 | 0 | | | | 0 | 0 | 0 | 0 | 1 | 0 | | | | 0 | 1 | 1 | 1 | 0 | 0 | | | | 0 | 1 | 0 | 0 | 0 | 0 | ) | | | 0 | 1 | 1 | 1 | 1 | 0 | | | | 0 | 1 | 1 | 0 | 1 | 0 | | | | 0 | 0 | 1 | 1 | 0 | 0 | upper | | | 0 | 1 | 1 | 0 | 0 | 0 | upper<br>subnetwork | | | 0 | 1 | 0 | 1 | 0 | 0 | | | | 0 | 0 | 0 | 0 | 0 | 0 | | | ı | 0 | 0 | 0 | 1 | 1 | 0 | | | | 0 | 1 | 0 | 0 | 1 | 0 | J | | | 0 | 0 | 0 | 1 | 0 | 0 | | | ı | 0 | 1 | 0 | 1 | 1 | 0 | | | ı | 0 | 0 | 1 | 0 | 1 | 0 | | | ı | 0 | 0 | 1 | 0 | 0 | 0 | | | | 0 | 1 | 0 | 1 | 1 | 1 | | | ı | 0 | 1 | 0 | 1 | 0 | 1 | | | ١ | ő | 0 | 0 | 0 | 1 | 1 | <u> </u> | | ı | ő | Ö | 1 | Ö | i | i | | | ı | ő | 1 | 1 | 1 | ō | 1 | | | ı | ŏ | ō | 0 | ō | ŏ | 1 | lower | | İ | ŏ | ĭ | ŏ | ŏ | ĭ | î | subnetwork | | I | ŏ | Ô | ŏ | ĭ | ō | î | Sublictwork | | I | ŏ | 1 | ĭ | ō | ĭ | î | H | | I | ŏ | õ | î | ĭ | Õ | î | | | I | ŏ | ŏ | ĩ | Õ | ŏ | î | | | | ŏ | ŏ | î | ĭ | ĭ | î | ľ | | | ŏ | ĭ | ī | ī | ī | ī | | | ١ | ŏ | õ | ō | ĩ | ī | ī | | | ١ | Ö | ĭ | Ō | Ō | 0 | ī | | | | 0 | 1 | 1 | 0 | 0 | 1 | | | I | | | | | | | | TABLE V-MEMORY CONTENTS AFTER REARRANGING each $S_l$ , l > 0. Since there are two elements in each $S_l$ , $\Psi$ consists of $2^{(N/2)-1}$ connection sets. For any arbitrarily selected $C \subseteq \Psi$ , the test will be only on the output integers. Therefore, C defines $C_1$ if and only if for any $\pi(x_i) \subseteq C$ , its dual is not in C. These ideas lead to the application of a finite state machine with $2^{(N/2)-1}$ states which are used to generate $\Psi$ . This machine, called the Trial-Partition Machine (TPM), is composed of (N/2-1) two-state storage devices (flip-flops), each of which represent an input integer pair. The "0" or "1" state of the flip-flop designates that the odd or even input, respectively, of the input integer pair and the corresponding output is in C. If $\sigma_i$ is a state of the TPM, then it defines C, and the $\pi(x_i) \in C$ are tested for an integer pair (two outputs with the same characteristic). If C contains no pairs, then it is reducible and can be used as $C_1$ . However, if there is at least one output integer pair in C, the TPM is advanced to $\sigma_{i+1}$ . Since the outputs are serially stored in the I/O memory, the memory must be sequenced in order to perform the test for each $\sigma_i$ . This type of a TPM can be easily implemented with a (N/2-1) bit binary counter. However, it may be desired to have a more intelligent machine so that $C_1$ can be determined in less time (with fewer trials). Of course, the complexity and, consequently, the cost of the TPM will increase as the intelligence of the machine grows. One way to TABLE VI—MEMORY CONTENTS AT THE THIRD LEVEL (AFTER TWO LOOPING PROCEDURES) | $m_5$ | $m_4$ | $m_3$ | $m_2$ | $m_1$ | $m_0$ | |-------|--------|----------|---------------|--------|--------| | 0 | 0 | 1 | $\frac{1}{0}$ | 0 | 0 | | 0 | 1<br>1 | 0<br>1 | 1 | 0 | 0 | | 0 | 1 | 1 | 0 | 0 | 0 | | ŏ | Ô | 0 | ŏ | ŏ | ő | | ŏ | ŏ | ŏ | ĭ | ŏ | ŏ | | 0 | 1 | 0 | 1 | 0 | 0 | | 0 | 0 | 1 | 0 | 0 | 0 | | | | | | | | | 0 | 0 | 0 | 0 | 1 | 0 | | 0 | 1 | 1 | 1 | 1 | 0 | | 0 | 1 | 1 | 0 | 1 | 0 | | 0 | 0 | 1 | 1 | 1 | 0 | | 0 | 1<br>1 | 0 | $\frac{1}{0}$ | 1<br>1 | 0 | | ő | 0 | 0 | 1 | 1 | 0 | | Ö | ő | 1 | Ô | 1 | ő | | | | | | | | | 0 | 1 | 0 | 1 | 0 | 1 | | ő | 0 | ő | Ō | ő | 1 | | ŏ | ĭ | 1 | ĭ | ŏ | ĩ | | 0 | 1 | 0 | 0 | 0 | 1 | | 0 | 0 | 1 | 1 | 0 | 1 | | 0 | 0 | 1 | 0 | 0 | 1 | | 0 | 0 | 0 | 1 | 0 | 1 | | 0 | 1 | 1 | 0 | 0 | 1 | | | | | | | | | 0 | 1 | 0 | 1 | 1 | 1 | | 0 | 0 | 1 | 0 | 1 | 1 | | 0 | 0 | $0 \\ 0$ | $\frac{0}{1}$ | 1<br>1 | 1<br>1 | | 0 | 1 | 1 | 0 | 1 | 1 | | 0 | 0 | 1 | 1 | 1 | 1 | | ő | 1 | i | 1 | 1 | 1 | | ŏ | 1 | Õ | Ō | ĺ | ī | | | | | | | | enhance the intelligence is to count the number of output integer pairs in C. This number (Z) specifies how many $\begin{bmatrix} x_i \\ \pi(x_i) \end{bmatrix}$ must be changed in order to have all $\pi(x_i)$ from different output integer pairs. Then the next set (C') is selected from those sets which contain at least Z different pairs and have not been previously tested. Another method is to observe the input-output pairs $\begin{pmatrix} x_i \\ \pi(x_i) \end{pmatrix}$ and $\begin{pmatrix} x_i \\ \pi(x_i) \end{pmatrix}$ that form an output integer pair, and select a C' that contains either $\hat{x}_i$ or $\hat{x}_i$ . For an example of a TPM, see Ref. 11. The TPM is applicable only for small networks because the number of tests becomes prohibitive as N increases. There are $2^{(N/2)-1}$ connection sets in $\Psi$ and, on the average, half of these must be tried. A TPM with intelligence, however, will reduce the number of sets to be tested, but for networks larger than $(64 \times 64)$ this will still be too time consuming. ### 4.4 Using Combinational Logic The control algorithm for the $(4 \times 4)$ network can be implemented with combinational logic. This is achieved by assigning the states of the five $\beta$ -elements for each of the 24 possible input-output permutations. The combinational logic, which consists of 13 nand gates, determines the correct setting from the outputs which are coded in the binary code. This method is practical only for very small networks because the number of permutations grows very rapidly. The $(4 \times 4)$ network is important, however, because it could be used as a building block to construct larger networks, and the combinational logic and the $\beta$ -elements could be on the same semiconductor chip. This same idea also applies to an $(8 \times 8)$ or $(16 \times 16)$ network using the TPM. ### 4.5 System Description The block diagram of a rearrangeable switching system with only one memory (I/O Memory) is shown in Fig. 7. The Network Control realizes the control algorithm by employing one or more of the above methods. For example, it may be advantageous to use the combinational logic and TPM for the small networks (or subnetworks) because they are relatively inexpensive to implement. However, as the size of the network grows, the processing time becomes critical. Then the looping procedure with one and two memories (or a content addressable memory) may be used for less than 1000 terminals and more than 1000 terminals, respectively. The timing considerations for these methods are discussed in the next section. Fig. 7—System block diagram. The Scratch Pad Memory temporarily stores the outputs during decomposition of P and also while the I/O Memory is being partitioned. The System Control generates the timing and control sequences for all of the operations. This unit, as well as the Network Control, could be implemented with stored program techniques if it is economical and if there is sufficient real time. The other important unit, called the Network Buffer, isolates the network from the Network Control so that the existing traffic will not be affected during the processing time of the control algorithm. As the settings of the $\beta$ -elements are determined, they are stored in the Network Buffer. After the control algorithm is completed, the states of the $\beta$ -elements are set within the time required to insure the quality of the transmission. This time is dictated by the switching transients of the $\beta$ -elements, network terminations, and the application. Serial shift-registers can provide economical buffers, if the $\beta$ -elements can be set in a stage-by-stage sequence. #### V. TIMING CONSIDERATIONS In this section, the processing time and the necessary equipment for the various methods of implementing the control algorithm are discussed. The processing time for the algorithm must be sufficiently short to accommodate the traffic changes. The combinational logic method is the fastest; however, it is applicable only for very small networks. The TPM has the next order of complexity, and it is applicable for $(64 \times 64)$ networks or smaller. The processing time, which increases exponentially, is derived as follows: For an $(N \times N)$ network, there are $2^{(N/2)-1}$ states of the TPM. If the TPM is a binary counter (no intelligence), on the average about half of the states or $2^{(N/2)-2}$ must be tested before the given P is decomposed to $C_1$ and $C_2$ . Also, on the average, one-half of the I/O memory is scanned before an output integer pair is detected. Therefore, the number $(A_T)$ of times that the I/O memory is accessed for $\log_2 N$ levels [the $\log_2 N^{\text{th}}$ level is the trivial $(2 \times 2)$ network, and the number of times the memory is accessed is simply N/2] is: $$A_T = \frac{N}{8} \sum_{i=1}^{\log_2 N - 1} 2^{N/2^i} + \frac{N}{2}. \tag{1}$$ The processing time can be greatly reduced by using the looping procedure. If a random-access I/O memory is employed, it is necessary to search for one-half of the outputs (the remaining outputs are obtained directly from memory) on each level of the network. For an $(N \times N)$ network, N/2 outputs are determined by accessing the I/O memory, on the average, N/2 times. Then $$A_T = N^2 \sum_{i=1}^{\log_2 N} \left(\frac{1}{2}\right)^{i+1} = \frac{1}{2}N(N-1).$$ (2) In addition, the access time required to partition the memory is $2N \log_2 N$ and should be included in equations (1) and (2) to give the total processing time. If two memories (in the crisscross manner) or a CAM is utilized, no searching is necessary, and access times for the decomposition of *P* and partitioning of memories are: $$A_T = 4N \log_2 N$$ and $A_T = 3N \log_2 N$ respectively. For $N=16{,}384$ and a CAM with 1- $\mu$ sec access time, the control algorithm, implemented with wired logic and a 10-MHz basic clock, can be accomplished in approximately 750 msec. If two memories (random access) with 1- $\mu$ sec cycle time are used, a processor with an instruction execution time of 3 $\mu$ sec can implement the control algorithm in approximately 50 seconds. #### VI. CONCLUSION In this paper, the network structure and control algorithm for certain $(N \times N)$ rearrangeable switching networks are described. The algorithm consists of decomposing a given permutation into d (where d is the base of the network) permutations for the d $(N/d \times N/d)$ subnetworks, and determining the connections for the N/d $(d \times d)$ networks in the input and output stages. The same procedure is applied to the subnetworks in an iterative manner until all of the connections in the $(N \times N)$ network are defined. Although the network can be constructed with various building blocks (bases), the base-2 structure is the most important because it requires the least number of two-state devices ( $\beta$ -elements) and the control algorithm is relatively simple. The algorithm is implemented by performing the decomposition either by the looping procedure or by a Trial-Partition Machine. Also, an efficient coding scheme is defined to facilitate the decomposition of the permutation and the partitioning of the memory. With the base-2 structure, the control algorithm and the coding scheme can be used in a consistent manner at each level of the network. There are other classes of network structures with the same number of $\beta$ -elements, such as the nested-tree networks, that do not have this property. The processing time and equipment complexity vary with the methods of implementation. The combinational logic is the fastest and least expensive; however, it is only applicable for very small networks. The Trial-Partition Machine is economical, but it is too slow for large networks; however, intelligence can be designed into the machine taking advantage of the fact that for a large number of permutations there are more ways than one of setting the network. Consequently, the processing time is dependent on the permutations given, as well as the amount of intelligence built in. The most suitable method for networks larger than (64 × 64) is the looping procedure with a content addressable memory to store the outputs. The processing time is independent of the permutations given. With this method, it is possible to determine the setting of all the $\beta$ -elements for a (16,384 $\times$ 16,384) network in less than one second for any number of new connections or terminations. During the processing time, the new settings for $\beta$ -elements are stored in a buffer; then their states are changed. The memory required for this system is about 300k bits or approximately 20 bits per input terminal. This paper has described a control algorithm for a rearrangeable switching network that is practical from both the system and processing time viewpoints. The application of this network should be considered where full access and nonblocking is required, and rerouting is possible. #### REFERENCES 1. Joel, A. E., Jr., unpublished work. 2. Case, C. C., unpublished work. 3. Levitt, K. N., et al., "A Study of the Data Communication Problems in a Self-Repairable Multiprocessor," Spring 1968 Joint Computer Conf., AFIPS Proc., 32, pp. 515-527. 4. Clos, C., "A Study of Non-Blocking Switching Networks," B.S.T.J., 32, No. 2 (March 1953), pp. 406-424. 5. Beneš, V. E., "Optimal Rearrangeable Multistage Connecting Networks," B.S.T.J., 43, No. 4, Part 2 (July 1964), pp. 1641-1656. 6. Joel, A. E., Jr., "On Permutation Switching Networks," B.S.T.J., 47, No. 5 (June 1968), pp. 813–822. 7. Batcher, K. E., "Sorting Networks and Their Applications," Spring 1968 Joint Computer Conf., AFIPS Proc., 32, pp. 307–314. 8. Kautz, W. H., et al., "Cellular Interconnection Arrays," IEEE Trans. Com- puters, EC-17 (May 1968), pp. 443-451. 9. Waksman, A., "A Permutation Network," JACM, 15, No. 1 (January 1968), - pp. **159–163**. - 10. Neiman, V. I., "Structure et Commande Optimales de Réseaux de Connexion Sans Blocage," Annales des Tèlècommunications, July/August 1969, pp. - 11. Tsao-Wu, N. T., and Opferman, D. C., "On Permutation Algorithms for Rearrangeable Switching Networks," IEEE Int. Conf. Commun., 1969, Conf. Record, pp. 10-29-10-34. 12. Koo, J., "Integrated Circuit Content Addressable Memories," IEEE Int. Solid-State Circuit Conf., 1970, Digest of Technical Papers, pp. 72-73.