PARALLEL MACHINES

by Andrew Boucher

(Very slightly modified from version published in Minds and Machines 7: 543-551, 1997)

Please send your comments to abo

Abstract. Because it is time-dependent, parallel computation is fundamentally different from sequential computation. Parallel programs are non-deterministic and are not effective procedures. Given the brain operates in parallel, this casts doubt on AI's attempt to make sequential computers intelligent.

Key words. Parallel, serial, sequential, Turing machine, effective procedure, time-dependent, non-determinism, McCulloch-Pitts neural network, artificial intelligence.

INTRODUCTION.

Although new concepts--such as deadlock, livelock, and non-determinism--have been required to describe parallel, as compared to sequential, programming, it is often thought that considered on a fundamental level, parallel is "just like" serial computation. For instance, Roger Penrose writes (1990, p. 48 and p. 398):

Using more than one Turing device in parallel action--which is an idea that has become fashionable in recent years, in connection with attempts to model human brains more closely--does not in principle gain anything (though there may be an improved speed of action under certain circumstances). Having two separate devices which do not directly communicate with one another achieves no more than having two which do communicate; and if they communicate, then, in effect, they are just a single device!
...There is no difference in principle between a parallel and a serial computer. Both are in effect Turing machines.

Daniel Dennett writes (1993, p. 269 and p. 217):

Given all the ink that has been spilt over this theoretical [sequential vs. parallel] issue, it is important to stress that this is a shift in the balance of power, not a shift to some "qualitatively different" mode of operation. At the heart of the most volatile pattern-recognition system ("connectionist" or not) lies a von Neumann engine, computing a computable function.
Since any computing machine at all can be imitated by a virtual machine on a von Neumann machine, it follows that if the brain is a massively parallel processing machine, it too can be perfectly imitated by a von Neumann machine.

And John Searle writes (1990, p. 22):

The parallel, "brainlike" character of the processing, however, is irrelevant to the purely computational aspects of the process. Any function that can be computed on a parallel machine can also be computed on a serial machine...

Computationally, serial and parallel systems are equivalent: any computation that can be done in parallel can be done in serial.

But parallel is not the same as sequential. Parallel programming is time-dependent; what a particular parallel program computes may depend on the speed at which the underlying processors operate. A non-deterministic or analog element is therefore fundamental to parallel processing, and parallel programs need not be effective procedures.

What is Parallel Computation?

The claim that sequential and parallel computations are the same is based on the well-known mathematical theorem that one Turing machine can simulate the workings of any finite number of Turing machines.[2] Put simply, given programs for several Turing machines, it is possible to write a single program which imitates them, by executing the first step of the first program and "saving" the machine's tape and state, then executing the first step of the second program and "saving," and so on, cycling through the steps of the several programs so that it eventually performs every action that the individual ones do. Because the result speaks of the simultaneous operation of many Turing machines, it has been thought to say something about parallel computation.[3] It does not, the difference being one of interaction. Each of the many machines in the theorem operate independently; they do not talk to one another, and each Turing machine proceeds unaffected by the others. However, a true parallel machine consists of several machines, each with the capacity to communicate with every other; they interact, and by so doing, any may influence any other's path. While apparently a small point, in fact it is crucial. Because of communication, a parallel machine's behaviour is time-dependent, and thus utterly different from the sequential variety.

To see this, model communication between two Turing machines as the ability of one to read the contents of the square where the other's read-write head happens to be at a particular moment in time. Now consider a parallel Turing machine consisting of two Turing machines A and B. A's program tells it to move rightward across its argument until it hits the first 0, and once it is there, communicate with B, by reading what B is reading at that instant in time; A then returns this value as the "answer." B meanwhile simply alternates its head between a square which contains 0 and a square which contains 1. Obviously, what the answer is depends on where B happens to be when A reaches the state of wanting to communicate; it thus depends not only on the argument (which may be defined as what is written on A's tape), but the speeds at which A and B work.

Suppose A always takes time rA to complete a step, while B takes time rB, and one clarifies such matters as exactly when communication is to occur (does A read what is on B's tape at the beginning of its step, in the middle, or at the end?). The reader may see that the program computes a function of the sort

f(n) = 1 if [z(n)r] is even, = 0 if [z(n)r] is odd


where z(n) denotes the length of n's initial non-zero string, r = rA/rB, and [x] equals the greatest integer less than or equal to x. Clearly r is critical in determining which function f exactly is; as r changes, so will f.

Is there any equivalent sequential Turing machine? For the real numbers r which are called "computable," there will indeed exist one. But there seems no reason to suppose that the ratio of operating speeds should be restricted to only certain reals. If r is permitted to be any real, it is easy to see that there are an uncountable number of such f.[4] Thus, there are non-computable f, and no equivalent sequential machine.

Consider Turing's (1936) metaphor for his machine, that of a person at a desk with a lot of paper. You give him some instructions and a beginning string of symbols, and he either goes on forever, or stops and hands you back a string. A communicating parallel computer is comparable to many people at separate desks, each with their own paper and instructions. What is more, at any moment, a person may get up and (leaving his paper on his desk) go over to another person's desk and see what's written on that person's paper, after which he returns to his desk. Obviously, what he sees will depend on where the second person happens to be in his computations, and the end result will not just depend on the instructions the people have been given, but how quickly each person works. So the process is different than that where each person works separately.

Objection. You force the real numbers into your representation of parallelism; so of course there will be non-computability. You should avoid their use.
Reply. Once communication is introduced, the time at which events occur becomes critical. So one must make some assumption about Time. Choosing the integers or rationals implicitly supposes perfect synchrony between different machines, and so is fallacious.

Objection. There is no difference between a single Turing machine which observes a real-number phenomenon and your parallel Turing machines.
Reply. Our claim is that A != B. Stating B = C is not a counter-argument, especially since A != C. (A is a normal Turing machine, B is a parallel Turing machine, and C is a Turing machine which observes a real-number phenomenon.) Moreover, there is a difference in kind. One is not directly observing any real variables, only digital ones--the parallel Turing machine's tape may only contain a finite number of symbols. It is a digital machine. The real variable makes its impact implicitly, not by being directly observed.
The idea behind the objection is that the time-dependence of parallel machines is really nothing new. Perhaps, but there are still some who believe that A = B, as evidenced by the quotes in the introduction, so it is worthwhile to state it clearly.

Although one should not make too much of this formulation, parallel machines are more powerful than sequential machines. They can do anything a sequential machine can do, and they can do things a sequential machine cannot.

Objection. There is no usable computational advantage to your parallel machines. Because you assume a non-computable input in your theory, you have not in fact proved that your machines are "more powerful." You have merely assumed that they are, via a circular argument.
Reply. One can in fact prove, for any function f : N -> N, that there exists a parallel Turing machine which computes f. But this does not assert there is a solution to the Halting Problem, because to compute the f which solves it, one would need to be able to fix the operating speed ratio appropriately, which one cannot, practically speaking, do. However, the point of this paper is not that one is able to use a machine to compute a particular f of one's choice, but that, given a particular machine, it computes in a non-computable way. Whether there is a use for this is uncertain, but some speculations will be provided below.
In any case, usability is independent of power. For X to be more powerful than Y means X is capable of doing everything Y is, and something more, usable or not. And this is the case with parallel machines equal to X, and sequential machines as Y.
Finally, it seems reasonable to allow all reals as operating speed ratios, and this assumption does not depend on the conclusion that Parallel Machines compute non-computable functions. Hence, the reasoning is not circular. On the contrary, what is circular is the following argument: All parallel Turing machine functions are computable, so the only real numbers permissible as operating speed ratios must be the computable reals.

Algorithmic Machines and Networks

Because of Turing's universal machine, it is sometimes erroneously thought that all machines are algorithmic, i.e. their behavior is precisely characterized by a sequence of finitary rules. Some machines, for instance sequential computers, are. But a parallel machine composed of algorithmic machines which interact, is no longer one, but something new entirely. Clearly there are algorithms involved, and these play a crucial role in the behavior of the parallel machine. But the over-all behavior is different from any algorithmic machine.

Networks of computers are examples of parallel machines which exist today. The input-output behavior of the whole network does not simply depend on the programs its computers are running, but also the respective speeds of the various computers involved. No single sequential computer, therefore, is equivalent to a network, which is a different entity entirely. For example, two computers attached to a single printer are capable of non-computable results, e.g. who prints out on the printer first.

Parallel Programs

Up to now the discussion has been about hardware--the machine. There are a few similar comments which can be made about parallel software. The instruction set which tells Machines A and B what they have to do (move rightward until it hits the first 0 for A, alternate back and forth for B) is an example of a parallel program. One can then ask whether there is any sequential program equivalent to it. The answer, again, is no, because the behavior of a parallel Turing machine running the program is not determined simply by the program and the input, but by the relative operating speeds of the individual processors. No sequential program has this same non-deterministic element, and so parallel programming languages are indeed necessarily different from their sequential cousins.

Unless one has perfect knowledge of the operating speeds of its various Turing machines, the behavior of a parallel Turing machine running a program is unpredictable. But it is a controlled unpredictability, not some kind of general chaos. Although one cannot be sure of the program's output, usually not all output is allowed. In fact, what output one gets gives information about the operating speeds of the underlying processors.

Objection. Sequential programming languages can have non-deterministic commands, for instance the "if" command in Gries (1983). There (p. 134) the absolute-value function is defined by the command

if x >= 0 then skip; if x <= 0 then x := -x fi,

but the programming language does not specify which branch must be selected in the case x = 0. Although the result is the same in this example, it is clearly possible to write programs where it is not. So there is nothing really new in parallel languages.
Reply. The non-determinism of a sequential programming language is about how to implement the program. For instance, one might implement the "if" command by always choosing the first branch when true, or by always choosing the second branch when true, etc. The implementation itself is deterministic. A parallel programming language's non-determinism is about something else. It is fixed by characteristics of the machine, namely the speeds of its processors. It is this machine-dependent non-determinism which distinguishes parallelism.
Put baldly, no non-deterministic sequential program is equivalent to our parallel program, because the parallel program gives information about the speeds of the underlying processors, which a sequential program, deterministic or non-deterministic, cannot.

Computation of Functions

The yardstick in comparing different methods of computation has usually been what functions the method computes, two types being equivalent if they compute exactly the same set of functions (e.g. Boolos and Jeffrey 1995, p. 20). One consequence of our discussion here is that this looks like an inadequate measure. For one thing, it assumes that every type of computation computes a function, which in the case of parallelism is a questionable assumption at best. Indeed, one can say our parallel program computes the function f. This is how I indeed phrased it above, because this is how it had to be phrased given the existing paradigm. But it is more accurate to say that it simply does not compute a function at all. On definite input there is no definite output. It is not an effective procedure. Remark that our conclusion, that parallel and sequential are different animals, remains.

One can write parallel programs so that there is no time-dependence, and so which do compute functions. Indeed, this is likely to be the direction that parallel computation takes in the immediate future. What people today want from computers are definite answers to definite questions. But this need not be, and perhaps some day will not be, the case.

Computer Intelligence

Since the development of high-speed electronic computers, there has been a widely held belief that it will be possible to construct computers which simulate intelligent human behavior. Perhaps the first major exposition for this thesis was made by Turing (1950), who would have us imagine a human interviewer asking questions of the occupant of another room, which might be a computer or a human being. The occupant would offer replies over some kind of teletype machine, and based on these answers the interviewer would have to decide whether what he was talking to was man or machine. Turing declared that if a computer could make the interviewer think it was actually a human being, then it should be held to be intelligent. Turing went on (p. 442) to predict that in roughly fifty years time--the year 2000--

an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning.

Now one can argue that Turing's Test does not accurately capture our notion of intelligence by being too weak. After all, it is possible to imagine a computer with an enormous amount of memory which has simply been programmed with all possible five-minute conversations.[5] Most would probably not consider such a computer intelligent, as the process by which the computer manages to maintain the conversation is at least as important as the fact that it can do so.

However, even disregarding this objection, it seems that Turing's confidence in machine intelligence was misplaced, because we are now almost at the year 2000, and the hard fact is that, despite flurries of enthusiasm every now and again, and a dramatic increase in computer power, we appear hardly closer to constructing a computer which could pass his Test than we were in 1950. Perhaps this is overly pessimistic, but if one accepts this as true, it is natural to try to step back and consider where Turing might have gone wrong.

In part Turing was simply continuing the attacks by Darwin and evolution against the theologians and philosophers who insisted human beings were innately special, whether by dint of a God or soul or consciousness or any other "X" factor. While one can sympathize with Turing's impatience with their articles of faith, the negative of a bad argument does not make a good argument. And while Turing in his 1950 paper went to great lengths to dispose of arguments for the conclusion that machines could not think, he did not propose a justification that computers could, a fact he himself recognized and admitted (p. 454).

If one accepts a materialist view of the world, then it is possible that human beings will be able to build an intelligent machine, because all one has to do is suppose them capable of duplicating the human brain and body. Yet there is a giant leap from this view to the one that the machines which men are able to fabricate today, can be intelligent. Part of the jump is made with the can-do optimism and self-confidence which characterizes scientists and the population at large. But again this is no argument.

Perhaps the strongest evidence which could be mustered for Turing's theories comes elsewhere, in the McCulloch-Pitts model of a neural network.[6] It is well-known that a McCulloch-Pitts network is equivalent to a finite machine (a Turing machine with a finite memory). The McCulloch-Pitts model is simple and elegant, and therein lies much of its appeal. However, while Warren McCulloch (1949, p. 492) once asserted "the brain is a logical machine," it is clear that just because a simple model of the brain is equivalent to a model of a sequential computer, there is no sound basis to conclude that the brain itself is, or acts, or can be imitated, by a computer. For of course the proof of equivalence depends essentially on the simplicity of the model, which if anything should raise suspicions concerning its accuracy.

As explained above, a parallel computer is more powerful than a sequential one. So, if one were to take the "Brain = McCulloch-Pitts Network = Serial Computer" formulation seriously, then it would appear that one could now conclude "Brain < Parallel Computer." However, as the brain operates almost exclusively as a parallel processor, it should be clear that it must be more powerful than a McCulloch-Pitts network. Indeed, there are at least five major deficiencies in the McCulloch-Pitts model: the all-or-nothing firing of a neuron; the linear summing of fixed weights and comparison with a fixed threshold; the integral time scale, and behavior always of constant duration; the inability of a network to change, e.g. by growing; and the fact that a neuron's behavior is not simply characterized by excitations, inhibitions, and firings, e.g. other chemical elements may be involved. Altering any of these elements in a realistic way--e.g. changing from an integral to a real-time scale--would alter significantly the behavior of a network, and introduce non-computability.

Von Neumann (1951, p. 10-11), writing in another context about the comparison of the brain with digital computers, argued that while all-or-nothing behavior is obviously not strictly true of a neuron, it is not true of a computer's parts either, and

the important fact is not whether an organ has necessarily and under all conditions the all-or-none character--this is probably never the case--but rather whether in its proper context it functions primarily, and appears to be intended to function primarily, as an all-or-none organ.

Of course, not understanding the nature of intelligence and specifically human intelligence--where and when it appears in the brain--, it is extremely difficult to determine just what is primary to a neuron's behavior. On the one hand, there is ample neurophysiological evidence of the importance of timing in the brain (see e.g. Churchland and Sejnowski 1992, pp. 51-53 and pp. 379-388). For instance, timing issues are present in a neuron's distinct firing pattern; when and how a neuron fires conveys information. In computational terms, what is important is not simply the result, but when the result occurs. Another example would be the fact that certain neural activity is rhythmic. But merely the fact that timing is critical to the human brain proves nothing about whether sequential computers can be made intelligent, because perhaps they can still imitate and approximate the human behavior relevant to intelligence, in the same way that they approximate our visual field with a digital representation, or indeed because perhaps it is not necessary to copy the human brain at all. Simply put, we must learn more about the brain, and more about the practical capabilities of machines, before passing judgment.

Conclusion

For the author the question of whether machines--sequential computers, parallel Turing machines, neural networks, or some other sort altogether--can be made intelligent will not be decided by philosophers but by scientists--whether or not the thing can actually be built. Still, the fact that parallel machines are strictly more powerful than sequential ones, and that the brain operates in parallel, does suggest that artificial intelligence will not be had on sequential computers. But clearly this is still only suggestion, and it would be speculation to say that the time-dependence found in parallel machines is indeed important. It would equally be speculation that it, or some similarly non-effective and non-computable phenomenon, is not important. The brain may be a logical machine, but it could well be a logic different from that of a sequential machine, and the fundamental flaw of current AI research may be that it tries to produce intelligence on machines which are incapable of it. We do not know. We probably will not know for some time. And perhaps we shall never know.

Notes

  1. This research was conducted in 1987-88 while the author was a Lecturer at the University of Southern California. The section on computer intelligence was partially supported by AFOSR Grant 88-0245. The author would like to thank Ian Rumfitt, Thomas Tymoczko, Hilary Putnam, John Searle, Marvin Minsky, and the referee Robert Cummins for their encouragement, comments, or criticisms. Any mistakes are the author's own.
  2. See e.g. p. 203, Lewis and Papadimitriou 1981 for the proof that a one-tape machine can imitate a multi-tape one, or see p. 217 of Dennett 1993.
  3. See e.g. pp. 217-8, Dennett 1993.
  4. Let r and r' be positive irrational numbers with binary representations disagreeing in the (2^-i)th place. Then [(2^i)r] is even if and only if [(2^i)r'] is odd.
  5. Ned Block has made this point.
  6. For a description of the model, see either McCulloch and Pitts 1943 or Minsky 1967.

References