[2,3 Turing Machine Universal? \$25,000 TM prize] - A New Kind of Science: The NKS Forum

Pages:1

2,3 Turing Machine Universal? \$25,000 TM prize

Posted by: Jason Cawley

On the fifth anniversary of the publication of A New Kind of Science and shortly after release of Mathematica 6, Wolfram announced a challenge question related to one of the book's conjectures.

All the details of the prize can be found here -

http://www.wolframscience.com/prizes/tm23/index.html

While that site has a lot more detail than I could give you here, if you have specific questions about the system in question I'd be happy to answer them. Or if you want to share ideas about the problem you are invited to post here.

For starters, I note that TuringMachine is a built in function in Mathematica 6, and you can simulate the system involved in the prize, from a blank tape and starting head state 1, with

data = TuringMachine[{596440, 2, 3}, {1, {{}, 0}}, 100]

To see more steps just raise that 100.

You can get nice TM graphics for any TM with a piece of Todd Rowland's code -

Options[TMGraphics] = Prepend[Options[ArrayPlot]
"States" -> True];

TMGraphics[data_, s_, k_, opts : OptionsPattern[]] :=
ArrayPlot[Last /@ data,
Sequence @@ FilterRules[{opts}, Except["States"]],
Epilog ->
If[TrueQ[OptionValue["States"]],
MapIndexed[
Translate[
Rotate[With[{r = .18}, {Disk[{0, 0}, r],
Polygon[{{r, 0}, {-r, 0}, {0, .6}, {r, 0}}]}],
2 (#[[1, 1]] - 1) [Pi]/s, {0, 0}], {#[[1, 2]] - 1/2,
Length[data] - First[#2] + 1/2}] &, data], {}]]

I hope this is useful...

Posted by: Craig Feinstein

This is a very good problem.

My guess (and it's only a guess) is that this given Turing machine is not universal. I say this because it seems that the main thing that this machine does is change red to yellow and yellow to red when the pointer is traveling left and white to yellow in the interior of a red and yellow region and white to red on the border of such a region. It eventually changes all white cells to red or yellow.

When the pointer is travelling right, the pattern is a little less predictible, since sometimes red stays red and yellow stays yellow. But I can't see how this behavior is going to lead to universality. In the long run, when looking at 200 steps of left compressed evolution here: http://www.wolframscience.com/prizes/tm23/gallery/
one witnesses behavior similar to rule 30. It's very random and chaotic.

Most people believe that rule 30 is not universal, myself included. (Although I think I read that Stephen Wolfram believes that rule 30 is universal.) A system can be computationally irreducible but not universal, as I think is the case with rule 30.

What does everyone else think? Does anyone think this Turing machine is universal? Why?

Posted by: Seth J. Chandler

There is a minor typo in the TMGraphics Code that will prevent it from executing properly. [Pi] should be \[Pi]

Posted by: MatthewF

I strongly suspect that this problem is either formally undecidable or just impractical to answer.

As far as I know, all previous TM universality proofs have started out with an "engineering model" for the simulation. The elementary rules for the machine itself are then chosen to support the model. The 2,5 Turing machine (NKS p.707) was an exception: this machine was found to be trivially equivalent to cellular automaton rule 110, for which Matthew Cook developed a universal simulation. The 2,3 machine does not appear to simulate any other system in a trivial way, so we are forced to develop a universality model for the TM itself, starting from the given TM rules. This has never been done before.

According to Wolfram's conjecture, any system with sufficient "complexity" or "pseudo-randomness" is universal. Many mathematicians doubt this hypothesis, for a number of reasons:

• As Nobel laureate Prof. Steven Weinberg (Dept. of Physics, U. Texas at Austin) explains in the New York Review of Books, we are lacking a suitable definition of complexity. "Computational irreducibility" is not adequate:

"The trouble with Wolfram's conjecture is not only that it has not been proved—a deeper trouble is that it has not even been stated in a form that could be proved. What does Wolfram mean by complex? If his conjecture is not to be a tautology, then we must have some definition of complex behavior independent of the notion of universality. The pattern produced by the rule 110 cellular automaton certainly looks complex, but what criterion for complexity can we use that would tell us that it is complex enough for Wolfram's conjecture to apply?

"There is a well-known parallel problem in defining randomness. The most common precise definition of the randomness of a string of digits or of a sequence of black and white cells on a tape is that it is random if there is no way of describing it with a string of shorter length. The trouble is that according to this definition the string of digits in a number like the square root of two would not qualify as random, because it can be described very simply —it is the square root of two—even though it surely looks random. (Mathematica gives the first thirty digits as 1.41421356237309504880168872420.) In the same way, it wouldn't do to define the output of a cellular automaton like rule 110 with a single black cell in the top row as complex only if it can't be described in simple terms, because it can be described in simple terms—it is the output of rule 110 with a single black cell in the top row. There are other definitions of randomness, such as the absence of correlations: the digits in the square root of two can be said to be random because, as far as is known, being given one digit at an unidentified decimal place tells you nothing at all about what the next digit is likely to be. Wolfram has not even begun to formulate a similar definition of complexity.

"In fact, as he admits, for Wolfram the real test of the complexity of a pattern is that it should look complex."

• Prof. Lawrence Gray (School of Mathematics, U. Minnesota) points out in his AMS review that the universality proof for rule 110 is a highly elaborate construction (NKS p.675-689), which "just barely works" and "shows just how tenuous the original conjecture was." The postulated universality of rule 30 is, according to Gray, "sheer wishful thinking."

This latter criticism may be imprecise, but Prof. Gray has a point. Universality for Wolfram is a fundamental property of the "simple program," and not just a feature of the constructed simulation. If Cook's method of construction had failed, but Wolfram's complexity hypothesis were valid, then the "universality of the system" would imply that some other valid construction must exist. I am not convinced. One may similarly claim that plastic, copper and silicon have an intrinsic computational universality, because you can use them to build a computer. Surely the "universality of rule 110" is in Cooke's construction, not in the rule itself or the apparent complexity of patterns generated from an arbitrary initial state.

The 2,3 Turing Machine is a very different system from the rule 110 cellular automaton. Starting a rule 110 CA from random initial conditions, we quickly see structures moving and interacting according to some localized rules (NKS p.677): rules which might form the elements of a simulation. By contrast the dynamics of the 2,3 TM are not localized. In left-compressed form (NKS p.709), the update rule for a given cell depends on all cells to its left, and none to its right. The time-evolution of every cell is eventually periodic with period 2 to the power n (in left-compressed form, with cells numbered n=0,1,... rightward from the "center" of the diagram), and so the entire future of any finite portion of the tape can be calculated in finite time. However, the progression of patterns along the tape does not appear to be periodic, and the sequence of "outward glitches" at the right-hand side (NKS p.1120) shows no obvious long-term predictability. I've extended Wolfram's plot from 350 up to 100,000 successive glitches, and found nothing interesting, except that the distribution of glitch intervals does not change much (the largest interval I found was 36 lines), and the interval is always even after the first one or two glitches.

If a universal construction exists for such an unwieldy system, there is probably no systematic way to find it; and if it doesn't exist, I can't imagine anyone will find a proof. A mathematical reduction of the machine could theoretically prove that universal constructions are impossible, although Weinberg's comments have highlighted problems defining a "simple reduction." However, the nature of the system suggests that no reduction will be found, any more than the digit sequence of the square root of two will prove to be "reducible." More likely, this Turing machine lies in a large gray area: an irreducibly complex system whose (non-)universality is either undecidable or beyond reasonable human reach.

Posted by: Craig Feinstein

If one observes for random inputs the behavior of the portion of the output of the prize Turing machine which Wolfram believes seems "complex enough that it could support universality", one always sees randomness and chaos, similar to Rule 30 and the Collatz 3n+1 sequence. But for a system to be universal, the behavior of the system cannot always be random and chaotic. All possible behaviors must be represented in the output for different inputs; therefore, why would anyone think this Turing machine is universal under any reasonable definition of universality?

And not only this, but one can argue that it is expected that simple programs will produce chaotic and random outputs like the prize Turing machine, Rule 30, and the Collatz sequence. Why? If there were no simple programs with this characteristic, then one could determine whether a string of outputs is random (according to Chaitin's definition of randomness) just by looking at it - if it looks random, then it must be random. But this would contradict Chaitin's Theorem, which says that it is impossible to verify that a string is random for sufficiently large strings. Therefore, there must be an intermediate level of complexity - computationally irreducible and random/chaotic, but not universal.

Posted by: Denis

Originally posted by Craig Feinstein
If one observes for random inputs the behavior of the portion of the output of the prize Turing machine which Wolfram believes seems "complex enough that it could support universality", one always sees randomness and chaos, similar to Rule 30 and the Collatz 3n+1 sequence. But for a system to be universal, the behavior of the system cannot always be random and chaotic. All possible behaviors must be represented in the output for different inputs; therefore, why would anyone think this Turing machine is universal under any reasonable definition of universality?

I don't understand why you think rule 30 is not universal .
The "random" behaviour as you know is not really random , under Chaitin' s definition of randomness the amount of information for each cellular automata is the same . I don't think there is relevant difference of information in the rule 110 and rule 30 the Kolmogorov complexity of the 2 rule is always 1 byte we are in a minimum limit .
I think what we define intuitively random is something related to uniformity definition , a statistical characteristic , but I think this is different from random .
Instead I think rule 30 and class 3 has 2 advantages respect to class 4 .

1) The trasmission of informations is very wide.

2) The data seems more complex than class 4 ( the kolmogorov complexity of local data for example at a given step seem bigger than local data in a class 4 )

You say "All possible behaviors must be represented in the output for different inputs" but in class 3 for example in rule 30 I think we have this . In the nks is possible to see the trasmission of information in different rules and in the class 3 we have the wider trasmission of information , the informations interact with a lot of bits in the next states so we have more differents states to identify more output.

And not only this, but one can argue that it is expected that simple programs will produce chaotic and random outputs like the prize Turing machine, Rule 30, and the Collatz sequence. Why? If there were no simple programs with this characteristic, then one could determine whether a string of outputs is random (according to Chaitin's definition of randomness) just by looking at it - if it looks random, then it must be random. But this would contradict Chaitin's Theorem, which says that it is impossible to verify that a string is random for sufficiently large strings. Therefore, there must be an intermediate level of complexity - computationally irreducible and random/chaotic, but not universal.

Why not universal? This does not require universality but does not exclude universality .

Posted by: Craig Feinstein

Denis,

You said: 'You say "All possible behaviors must be represented in the output for different inputs" but in class 3 for example in rule 30 I think we have this.'

How so? When I examine the output of Rule 30 for various inputs, I only see chaos and randomness. I never see any stability. However, Rule 110 is a different story. One sees stability, chaos and randomness, and self similarity; every possible system is emulated in its output.

Posted by: MatthewF

We have to be careful with words like "randomness", "chaos" and "stability". Try encrypting a paragraph of text using any respectable encryption algorithm, and it will immediately appear "random". Then change one letter of the original text and encrypt it again, and the entire jumble of encrypted data will change. It would be easy to convince yourself that the encryption algorithm is "random" and "chaotic" - but you know, of course, that no information has been lost, and the encryption is fully reversible. The human eye is not a reliable judge of information content.

The requirement that "all possible behaviors should be represented in the output" is also somewhat vague: I'm still waiting for a proper mathematical definition of "all possible behaviors" that doesn't depend circularly on the definition of universality. In some sense a true random number generator will display all possible behaviors, if you wait long enough. A team of monkeys will eventually type the complete works of Shakespeare, but that doesn't mean you can program them to do so.

Rule 110 was a relatively easy system to program (I say 'relatively', as there were years of work involved), because the information flow was tangible to the human mind. You could see patterns moving and interacting. Rule 30 and the 'prize' Turing machine do not display any intuitive patterns, because the information flow is not localized. However, if you look at the output of these rules with an appropriate "decryption algorithm", the patterns might become blindingly obvious, just as your web browser is capable of tranforming a 128-bit-encrypted stream of data into English. The trouble is, we will probably never know.

Posted by: Craig Feinstein

MatthewF wrote: "Rule 30 and the 'prize' Turing machine do not display any intuitive patterns, because the information flow is not localized. However, if you look at the output of these rules with an appropriate "decryption algorithm", the patterns might become blindingly obvious, just as your web browser is capable of tranforming a 128-bit-encrypted stream of data into English."

I never thought about this possibility. Thank you for bringing this to my attention!!! Therefore, I take back everything I said previously in this thread.

Posted by: Craig Feinstein

There is no universal 2-state 3-color Turing machine. The reason why is simple. In order for a computer to be universal (not just Turing machines but also cellular automata) it is necessary for the machine to be able to process at least 3 bits of information in a single step of computation. For instance, rule 110 has this characteristic, because each cell is updated based upon three bits of information:

000 -> 0
001 -> 1
010 -> 1
011 -> 1
100 -> 0
101 -> 1
110 -> 1
111 -> 0

A 2-state 3-color Turing machine does not have this characteristic. It can only process log_2 6 bits of information in a single step of computation, since 2x3=6.

The reason why 3 bits is the minimum required for a computer to be universal is in this brief note that I posted here:

http://eprintweb.org/S/article/cs/0706.4440

Update: See the thread "Universal Computation with only 6 states" started by Alastair Hewitt. We see from this that my claim that a machine must be able to process 3 bits of information to be universal does not hold true for Turing machines with a circular tape. It only holds true for Turing machines with infinite tape and cellular automata.

Another Update: I now believe that my proof that there are no 2-state 3-color Turing machines is not valid. I withdrew it from the arxiv.org. I welcome anyone to try to fix my proof.

Posted by: Jason Cawley

Since the predecessor statement is true for all TMs, and TMs as a class are the definition of universality, and certainly contain any number universal machines, including e.g. the 2,5 machine given in the book which emulates rule 110, the alleged consequence cannot be correct. The number of bits processed at a single step, as opposed to processed at all over any number of primitive operations, is simply not relevant to the question of universality.

Posted by: Craig Feinstein

Jason said: "Since the predecessor statement is true for all TMs, and TMs as a class are the definition of universality, and certainly contain any number universal machines, including e.g. the 2,5 machine given in the book which emulates rule 110, the alleged consequence cannot be correct."

The statement I made in my post certainly does not apply to all Turing machines. Specifically, it does not apply to the 2,5 Turing machine in NKS which emulates rule 110, because such a Turing machine is able to process at least 3 bits of information on its tape-head, as 2x5=10 and log_2 10>= 3. It also does not apply to 2,4 Turing machines, i.e., it is still possible that there are 2,4 Turing machines that are universal, since such a machine is able to process 3 bits of information on its tape-head, as 2x4=8 and log_2 8=3. Similarly, it is still possible that there are 3,3 universal Turing machines, since 3x3=9 and log_2 9>= 3.

However, there is no way that a 2,3 Turing machine can be universal. It's simply too dumb to be universal, as its tape-head cannot even process 3 bits of information, which is necessary for universality.

Jason also said “The number of bits processed at a single step, as opposed to processed at all over any number of primitive operations, is simply not relevant to the question of universality.”

If what Jason is saying is true, that the number of bits that are processed at a single step is not relevant to the question of universality, then why can’t there exist a 3-state 2-color universal Turing machine or a 2-state 2-color universal Turing machine? (It is already accepted that such Turing machines cannot be universal.)

Posted by: Jason Cawley

There is no requirement that an emulation use the same number of steps as the emulated system. It is perfect typical for, e.g., the simpler system to use a large block of cells to encode a single cell in the emulated system and details of its perhaps more sophisticated state. All TMs have access to a potentially infinite tape array on which they can write and read back an infinite number of bits. If an operation corresponding to a single step of the emulated system needs 5 bits or 10 bits, then the emulating system may need to encode a state and operation pair on a block of tape cells 10 wide or whatever. That is perfectly allowed, and nothing in the PDF at the archive link addresses the question. (The archive PDFs (1) or (2) need not be encoded in the head state, but might instead be encoded in whether the parity of the emulating machine's tape in a region 4 long is even or odd, or whatever).

As for that, or why, simpler TMs are not universal, we can simple enumerate them all and see what all of them do, and none of them do anything complex enough to support universal computation. Our knowledge in the matter is not a function of a theorem about the required per step information, but an empirical observation of their actual behavior. For the 2,3 machine in question, it is entirely possible that it can't support universal computation, but this has not yet been shown. It clearly does more than any of the 2,2s does.

Posted by: Craig Feinstein

Jason wrote: “There is no requirement that an emulation use the same number of steps as the emulated system. It is perfect typical for, e.g., the simpler system to use a large block of cells to encode a single cell in the emulated system and details of its perhaps more sophisticated state. All TMs have access to a potentially infinite tape array on which they can write and read back an infinite number of bits. If an operation corresponding to a single step of the emulated system needs 5 bits or 10 bits, then the emulating system may need to encode a state and operation pair on a block of tape cells 10 wide or whatever. That is perfectly allowed, and nothing in the PDF at the archive link addresses the question.”

And I respond that this is all irrelevant to the argument in my paper. The argument in my paper is independent of the specifics of how the universal Turing machine encodes the rules and tape of the Turing machine that it is emulating. My paper does not need to address this question.

Jason wrote: “(The archive PDFs (1) or (2) need not be encoded in the head state, but might instead be encoded in whether the parity of the emulating machine's tape in a region 4 long is even or odd, or whatever).”

I respond to Jason, consider the following scenario:

I want to teach a dog calculus. The dog does not have the intellectual capacity to learn calculus; however, if I attach to the dog’s collar a tape with lots of instructions written on it encoding the rules for calculus, then the dog should then be able to understand calculus or at least perform calculus calculations.

Does this make sense? Of course not! Yet, this is what you are arguing one can do in order to teach a 2,3 Turing machine to be universal, even though its brain, the tape head, clearly doesn’t have this capacity, since its tape head cannot even process 3 bits of information.

Jason wrote: “As for that, or why, simpler TMs are not universal, we can simple enumerate them all and see what all of them do, and none of them do anything complex enough to support universal computation. Our knowledge in the matter is not a function of a theorem about the required per step information, but an empirical observation of their actual behavior.”

Jason, you think it is a coincidence that there are no 3-state 2-color universal machines or 2-state 2-color universal machines? It’s an accident that just happens to be true when you empirically look at the actual behavior of such machines? You can’t be serious. Obviously, the reason why there are no 3-state 2-color or 2-state 2-color universal TMs is because the tape head of such machines does not have enough processing power to support universality. Similarly, 2-state 3-color Turing machines don’t have enough processing power to support universality.

Posted by: Craig Feinstein

Because I don't want my note describing the solution to the Wolfram prize puzzle to be misinterpreted as I believe Jason misinterpreted it, I made some changes to it here: http://arxiv.org/abs/0706.4440

See the second version for the changes.

Posted by: Alastair Hewitt

I tend to agree with Craig Feinstein about the 3 bit limit for universality. However, I have just started my own analysis of the 2,3 TM problem and noticed a potential loop-hole.

There is more than just the 2-states and 3-colors to the machine, there is also the head position. This is encoded in the rule number as another bit, but obviously is not a free parameter - you can't have all one direction or the other. The worse case scenario, which is true for the prize machine, is an equal number of left and right positions: This gives 20 combinations out of the 64 possible head placements, or 0.3125 bits.

So the machine state appears to consist of the following information:
State: 1 bit
Color: 1.585 bits
-----------------------
Total ~ 2.9 bits

Still not quite to 3 bits, but this was a "worse case". If it is universal, then it must be right on the edge. An interesting problem indeed!

Posted by: Craig Feinstein