My micro-eureka moment of today.

August 15, 2006

Well, make it nano (or pico (or femto)) if you want. I have not discovered a new principle as Archimedes did, but have come to realize a mathematical beauty that exists in the nature’s design.

I have realized that genetic code is a many-to-one function that transforms a sequence of nucleotide into sequence of amino acids. Because this function is many-to-one, it can not be inverted. In terms of genetics it means that – if you know the sequence of nucleic acids in a particular gene, you can always say what protein would it produce; but it is not possible to know the gene that has produced a particular protein (which is a sequence of amino acids).

Sounds foreign? Let me explain.

The right question to ask is: How does a gene control protein manufacture?

Genes are particular sequence of nucleic acids (A,T,G,C) with defined pattern. Every triplet (termed codon) in this sequence encodes an amino acids. Proteins are nothing but a peculiar series of amino acids. When a gene is activated, its sequence is copied to mRNA (messenger RNA). Messenger RNA then carries this genetic information to the place where codons are read and amino acids encoded by them strung togeather. When amino acids are strung togeather, they assume particular chemical and structural properties that in turn govern their functional properties. Proteins are work-horses of cells – they do most of the work.
The code that matches genetic codon with a particular amino acid is uniform, degenerate and unambiguous. What this means is:

  • Unifromity: All organisms use the (almost) the same genetic code. (Isn’t it amazing – it really shows we really have grown out of amoeba!)
  • Degeneracy: More than one codon can represent any given amino acids (by the way, in humans there are 20 of them)
  • Unambiguity: Every codon represents one and only one amino acid.

So if you consider genetic code to be a function (call it ‘g’) maping a set of codons (call it ‘C’) to a set of amino acids (call it ‘A’), we can see that g: C -> A maps every member of the set C to a single but non-unique member of A. Thus, function g is many to one.

Now, it is a mathematical truth that many-to-one function cannot be inverted. Hence, you can never ever get a nucleotide sequence out of protein structure.(edited thanks to Johan’s comment) Hence, eventhough you can infer what nucleic acid sequences may give rise to certain amino acid sequence, you can never be sure which one actually did.