Simulation of Large Communication Networks
			on the Connection Machine

		     Honsing Cheng and Lamine Diatta
				 May 1995


1. Introduction

	Suppose we have a stochastic model for a real life discrete
event dynamic system (DEDS) for which we can measure the performance
based on the value of a parameter w. That is, we want to find J(w)
where J is the performance function of the system. In particular, the
area of stochastic optimization deals with how to find a value w*
which will yield the "best" performance, both analytically and
experimentally through simulations. In cases where the solution for w*
is analytically hard or impossible (due to complexity of modeling or
limitation of mathematical tools), simulation must be used to obtain
performance measurements J(w) for a finite set of w's. w* is picked
to be the w for which J(w*) >= J(w) for all w's tested. In fact, many
systems require such optimization technique. It is therefore very
important to be able to run simulation of large complicated systems in
least time so to speed up the evalution of J(w) for each w in the
entire candidate set.

	This project attempts to provide some guidelines of how to use
massively parallel computers for efficient simulation of DEDS, which
is a crucial step in experimental stochastic optimization. We first
present a high level picture of how parallel computer fits into the
overall picture of optimization. Two models will be presented,
according to two different levels of application of parallelism. After
this, we will argue that how parallel simulation can improve the
confidence of optimality of J(w*), which is the probability that J(w*)
belongs to the top p-th percentile of the entire population (from
which only a finite set is drawn for simulation). We then present an
example which applies the technique to solve the capacity assignment
problem of communication network modeled as a DEDS. We will also
discuss how parallelism can allow efficient evaluation of aggregation
techniques, which are frequently used to find simpler representation
for large queueing netowrks.

	Throughout this paper, when we say "a design" of a system, we
mean "a value for a parameter" on which the performance function
depends. When we say "a set of designs," we mean "a set of values for
some parameter;" the specific parameter is not of concern.

2. Use of Parallelism in Performance Evaluation

	Efficient evaluation of designs is critical in experimental
optimization, both in terms of accuracy and solution time (the time it
takes to find the solution). There are two levels at which parallelism
can be applied for improvement in both respects. Consider a system as
made up of different subsystems. If two or more processors deal with a
_common_ subsystem, we say they operate at the _local_ level. On the
other hand, if two or more processors operate on _different_
subsystems, they operate at the _global_ level. In Figure 1a, P1 and P2
are engaged in a global level optimization, while they work at the
local level in Figure 1b, where the local subsystem is the entire
system itself.

	|-------------|			|--------|
	| subsystem 1 |<---P1		| 	 |<---P1
	|-------------|			| System |
	| subsystem 2 |<---P2		|	 |<---P2
	|-------------|			|--------|

	   <Figure 1a>			  <Figure 1b>

2.1. Local Optimization

	We have a subsystem for which we want to find the best design
for one of its parameters. Suppose we have design w1, w2, ..., wN.
Each of them will affect the subsystem in some way which in turn
affect the performance of the entire system. The problem is to find w*
which will yield the best performance J(w*). That is:

	J(w*) >= J(w1), J(w2), ..., J(wN)

Then, if we have N processors, we can have each of them simulate the
system under one of the w's and therefore produce one of the J(w)'s in
parallel. This means we can find w* in time regardless of N.

	What is the implication of this? Consider a system that does
not allow any practical analytical approach to its optimization, nor
is there any useful heuristic to guide the search of the optimum
design using simulation. Suppose we find, through simulations, that w'
to be the best design of a set of N designs randomly chosen from the
design space. Then we would expect w' to get closer to the true
optimum as N gets larger. In other words, this is simply saying that
the sample optimum gets closer to the true optimum as the sample size
gets larger. The fact that by applying parallelism, we can obtain a
sample of performance values regardless of the sample size implies
that we can obtain better sample optimum without increasing the
solution time.
	
	To obtain J(w1), J(w2), ..., J(wN) on a sequential machine,
and then determine the sample optimum, we need N*T where T is the time
of a single simulation. Now with parallelism, we only need time T for
any sample size. The saving in solution time can be used in another
way which improves the approximation by simulation. Put in
differently, running a simulation for a finite length of time, the
resulting performance measure is only an approximation of the true
system performance for that parameter. That is, we observe:

	J'(w) = J(w) + X

where J(w) is the true performance and X is an arbitrary noise term
resulted from the finiteness of simulation. To get rid of the noise
term completely, a simulation has to run for infinite time. The saving
in time of (N*T - T) can be used to diminish the noise. That is, we
simply let each processor to run longer and the resulting J(w1),
J(w2), ..., J(wN) will be a much better approximation.

2.2. Global Optimization

	A divide-and-conquer strategy is an effieient way in using
parallelism to solve global optimization problem. Each subsystem by
itself is to be optimized, and is taken care of by one processor. The
set of N processors proceed in parallel and in time T independent of
the number of subsystems (N), the optimum parameter for each and all
of the subsystems are found. Therefore, the solution time is
independent of number of subsystems to be solved, which in some way
tells the complexity of the system being studied. More specifically,
if the entire system consists of subsystems S1, S2, ..., SN and T(S)
is the time to solve subsystem S, then:

	T = max{ T(S1), T(S2), ..., T(SN) }

is the time when the global optimization is finished. (Some time is
also needed in integrating the results and checking feasibility.)
	
	The above approach works only when there is limited
interactions among the subsystems, so that "locally" optimized
subsystems, when put together, will yield a globally optimized system.
Also, the combined system might not even be feasible. For instance:

		A
	--- >O ---|
		  |-- >O ---
	--- >O ---|	X
		B	

where each '>O' represents a server in a queueing network. Suppose
putting a buffer of size x for server A and y for B separately might
yield good performance, however, (x+y) might exceed certain resource
constraint and hence the naive combination will not be feasible.

	If these conditions hold, then parallelism will gain a factor
of N = #of subsystems comprising the whole system. As the system gets
more complicated and N gets larger, the gain from parallelism
increases as well.

3. Gain of Optimality Confidence

	Suppose we use some way to solve an optimization problem and
find that w* will give the best J(w) for a finite set of w's we
experimented. As explained above, the performance of w* we obtain is
an approximation. That is:

	J'(w*) = J(w*) + X;		X is the noise term

We define the confidence of optimality of an approximate optimum
parameter w* at q percentile to be:

	Prob[ Given J'(w*) is in top q percentile of all J'(w) for all
		w's experimented, J(w*) actually belongs to top q
		percentile of the entire population of w ]

Confidence of optimality is important because if it is too small, we
cannot expect w*, when used in the true system (as opposed to the
simulated one), will give a performance really in the top q
percentile, though this is true for the simulated system. If the
optimality confidence is high, we can expect the best we obtain from
simulation is indeed the best in real system.

	Suppose we try out N w's in each simulation and find the best.
Then collect the best of M such simulations into a set S. Call the
best in S w*. On the other hand, run a single simulation which tests
M*N w's and call it w'.  Then, it can be shown that:

	Prob[ J(w*) in top q% ] >= Prob[ J(w') in top q% ]

The implication of this is this. In solving optimization problems,
finding the best of M best values each obtained from a simulation of
size N gives a confidence of optimality at least as high as the best
obtained from a single simulation of size M*N. Therefore, we can use
parallel computers to carry out the M independent simulations in
parallel and find the best from the M best results. The confidence of
optimality achieved is potentially higher and the solution time can be
much reduced.

4. An Example of Local Optimization using the CM-5

	In this section, we present an example of applying the
parallel local optimization technique to solving the capacity
assignment problem in communication networks modeled as a DEDS. The
capacity of a communication channel is defined to be the maximum
amount of data that can be transmitted through the channel in unit
time. The problem is to find optimal capacity assignment for all
channels in the network, under some constraint. The measurement
criterion is average message delay.

	For specifics of the model, please read Appendix A. The
following are some important points:

1) Channels are modeled as queues with infinite buffer size. The only
delay a message will encounter are all from waiting for being served
by a channel and the actual service time by the channel.

2) Traffic from one node to the other follows a Poisson process. So
the time between the creation of successive messages going between any
pair of nodes is therefore exponentially distributed. Service time at
a particular channel depends only on the length of the message.

3) Routing procedure is fixed. That is, the route between any pair of
nodes is both unique and fixed. The input file to the simulation will
provide such information.

The network being simulated is derived from the ARPANET in 1975. The
data, such as traffic rate between nodes and sum of channel capacity,
are hypothetical. The topology is shown in Figure 2.

	<Figure 2> Topology of the simulated ARPANET

	The local optimization technique is applied as follows. Each
of the M node of the CM-5 chooses N random set of capacity assignments
under the constraint and determines the best one from simulations. The
solution is then determined to be the best among these M best
assignments. The implementation details and output are shown in
Appendix B.

5. Efficient Evalution of Aggregation Techniques

	Aggregating a queueing network means to find a simpler
queueing network that can capture the performance characteristics of
the original system. Therefore, it is a form of approximation. The
aggregated system (the simpler one) can then be sued either for
analytical optimization or for simulation. Certainly, not all
characteristics of the original system can be preserved in the
aggregated system; and some of them will be "aggregated away," i.e. no
longer be preserved. Therefore, suppose we want to optimize average
time delay in a communication network, we are only concerned with
preserving the average time delay characteristics of the original
system when finding an aggregate for it.

	How good an aggregation technique is depends on how well it
preserve the concerned characteristics. If we only have a finite set
of w's to experiment and we only want to know which w will give the
best J(w). Then the absolute values of the J(w)'s might not matter, as
long as the best w in the original system will also give best
performance in the aggregated system. We can then define:

	Prob[ w* = w' | w* gives best performance in original system
		and w' gives best performance in aggregated system ]


to be the alignment probability for that technique, which can thus be
used to evaluate aggregation techniques. The most frequently
encountered technique, namely the equivalent server technique, is
described in some detail in Appendix C.

	Parallel computers can also help speeding up the evaluation
process. Suppose we have an original system P and an aggregated system
Q, and N different w's are to be experimented. We obtain performance
of the N w's via N simulations of P in parallel, and then N
simulations of Q also in parallel. To get the alignment probability,
we need to repeat the above procedure quite many times. Fortunately,
with parallel simulations, the time it requires depends only on how
accurate we want the obtained alignment probability be, but not on how
many w's we are comparing. Note that this task of evaluation can just
be prohibitively expensive on a uniprogramming machine.

6. Conclusion

	In this paper, we have emphasized on hwo parallel computers
can be used to improve various aspects of stochastic optimization
including solution time and confidence of optimality. Since
aggregation technique is commonly used in optimizations, we have also
given a general outline of how parallel computers can speed up the
process of evaluation of aggregation techniques. Although some of the
materials are quite straightforward, we hope the clear statements of
various algorithms and some preliminary analysis provided in this
paper can be helpful for getting a more solid picture of how
parallelism fits into the picture of stochastic optimization.