[Scientific American bioinformatics article] - A New Kind of Science: The NKS ForumA New Kind of Science: The NKS Forum
Pages:1
Scientific American bioinformatics article
(Click here to view the original thread with full colors/images)
Posted by: Jason Cawley
Evolution Encoded
Stephen J. Freeland and Laurence D. Hurst have an article in Scientific American on bioinformatics that looks at some aspects of base layer DNA encoding, compared to the large possibility space of possible alternate encodings. I thought some might consider it interesting as a problem potentially subject to NKS style attack. Some relevant passages from the article -
"First we took a quantitative measure of the 20 amino acids' hydrophobicity. Next we used those values to calculate the genetic code's error value, which we define as the average change in the resulting amino acids' hydrophobicity caused by all possible single-letter changes to all 64 codons of the code."
"...Using these technical assumptions, we generated alternatives by randomizing the 20 meanings among the 20 codon blocks. This still defined some 2.5 X 10^18 possible configurations. So we took large random samples of these possibilities and found that from a sample of one million alternative codes only about 100 had a lower error value than the natural code."
The full article can be found here -
http://www.sciam.com/article.cfm?ch...umber=5&catID=2
Posted by: Mohammed AlQuraishi
This is rather interesting work, and reminds me of the Rafiki code (http://www.codefun.com/Genetic_see.htm) although the authors of this article are a lot more focused.
I particularly like the fact that they only searched the code-space containing "wobble" and still came out with very impressive numbers. There's definitely something to be said about the code having evolved. Furthermore, I don't think this necessarily has to contradict the "frozen accident" hypothesis. Given the prevalence of the standard genetic code, it does seem to have "frozen." What is likely to have transpired during the early stages of the code's development is an accelerated period of evolution, where the overall basic layout was quickly and randomly set in stone (the first letter position) with subsequent modifications becoming more and more difficult, and restricted to the second and third positions.
In the context of NKS, I think this supports the idea that nature is essentially doing a random search (we were still able to beat it after a million random samples) but that some form of selectionary pressure is also at work. If this were an entirely random process, the results of this article could not have been obtained. On the other hand, if evolution was tantamount to an optimal search, we couldn't have beat it by searching through only a tiny fraction of the code-space.
Posted by: Jason Cawley
That sounds likely to me. I am wondering about places the same sort of analysis might be extended, with a bit less ensemble thinking and a bit more of the search of a possibility space idea. Obviously they did quite a bit of that on the code possibility side, which is great, and why I like the article so much as NKS-like. But what about doing it on the error value side (i.e. "unpack" the measure)?
That is, instead of asking of each code possibility, what is its numerically averaged value for 64 single letter changes, take some more restricted set of possible codes, and look at how much the protein structure is altered with each of those one-move changes, and with combined pairs. So, the "wiring" of the code is semi-fixed (a small set, the natural one and a few others, say one more "optimized" and one seemingly random etc), the 1 and maybe 2 switch "errors" are enumerated rather than averaged. Then you look at base pair sequences as the big possibility space, and gets lots of raw data rather than just an average for each code. Starting from short amino acid chains, plowing up as high (in chain length) as it stays practical to look at a fair portion of the possibilities.
See the idea? Instead of a big space of codes (what they did, useful, but they already did it) and averaging of possible errors, a big space of (all pretty short) proteins and a significant, but tractable, space of explicit possible errors.
The other thing the "pretty good but not optimal" result they found reminds me of is the stuff in the NKS book on approaching satisfaction of complicated discrete constraints by an iterative procedure. Where one typically sees rapid improvement, then it slows, and eventually it crawls. The last bits do not "fall" to iterative improvement, when the constraint set is complicated, rather than e.g. some one dimensional local minimum. Those experiments are typically set up to ask in effect whether some measure of fit improves or not with discrete flips.
So, one could imagine trying to set up a toy version of a possible evolutionary path to a code. Does this one assignment change make the resulting coding score higher or lower on an error sensitivity measure (like the one they used), than the code before that one switch? To try to get a handle on how easy it is to "walk" to good error scores when you take only one step at a time and only detect "improvement" locally.
Posted by: Mohammed AlQuraishi
Your ideas are interesting and merited, but I think there are feasibility issues involved (if I understand you correctly.)
Firstly, while enumerating the space of all possible small proteins may not be very difficult, measuring whether the resultant proteins are catastrophically altered or not is definitely non-trivial. If simple measures are used, such as hydrophibicity (like they did), then I'm not sure much will be gained by producing those protein datasets. It makes sense when the result is more realistic, i.e. one is able to infer more than by simply looking at individual amino-acid substitutions, and to do so would involve protein-level analysis. While not necessitating a "proteomics" degree of accuracy, this is still likely to be very computationally intensive. That's not to say it's impossible, but it will be quite an undertaking.
I like the second idea even more, but again it may be difficult to implement just yet. To truly capture a meaningful "walk" through the code-mutation space, the effects of single mutations must somehow be simulated on a hypothetical early organism. Thus, one can observe the plausability of a random walk given the biological context, and determine at what level of biocomplexity it becomes essentially unfeasible to "walk", i.e. the code becomes frozen. Given that we have yet to fully model a single prokaryotic cell, or even a part of a cell, this again represents computational difficulties, but ones that may become tractable within a few years.
Posted by: Alexandre Ismail
Cool article. Of course, the aspect I like most about it is that basing the "fitness" of the sequence on preservation of hydrophobicity gave good results on another level. (My project? I'm working on it, I'm working on it.)
Posted by: Vasily Shirin
They proved that hydrophobicity is an important, but is not the only, criterion. There can be a number of secondary criteria, which can also be slightly different for different organisms. That's a possible reason why encoding in E.coli differs from that in human being: not everything that is good for human is equally good for e.coli.
Posted by: Vasily Shirin
It's instructive to analyze the logic behind this piece of research:
1. Somebody guessed that encoding rules are not just arbitrary rules: they minimize the effect of errors on the shape of proteins
2. Sombody observed that the shape of proteins is very much dependent on property of hydrophobicity
3. These 2 guys made numeric analysis, and found out that encoding is indeed pretty optimal, but not QUITE optimal.
4. From here, they make some far-reaching conclusions about the way how evolution works.
My question is: on what grounds they come to believe that their idea of what is optimal for e.coli is really 100% percent accurate? To make it 100% accurate, they have to define 100%-accurate criterion of "distance" between two shapes (due to premise 1), which may not be (and is certainly not) linearly dependent on hydrophobicity; this is not all, however - because e.coli is interested not in preserving overall shape of its proteins per se, but in survival (according to central tenet of darwinism), and it's certainly not possible to accurately predict the variation in survival based on variation in "shape". Therefore, all calculations they make are good within certain margin of error, which they don't (and can't) theoretically evaluate. So, when they claim that they found some encoding that is "more optimal" than the real one, it can very well be due to the margin of error. However, they don't even consider this possibility. (I find it quite plausible that had e.coli implement their "more optimal" encoding, the whole species would die out pretty soon)
Forum Sponsored by Wolfram Research
© 2004-2008 Wolfram Research, Inc. | Powered by vBulletin 2.3.0 © 2000-2002 Jelsoft Enterprises, Ltd. |
Disclaimer
vB Easy Archive Final - Created by Xenon and modified/released by SkuZZy from the Job Openings