DNA Isn't Code

02 November 2021

Before I was a software developer, I originally planned on attending med school. While that never really panned out (and given the current state of the world, I’m sort of glad!), I did graduate from university with a degree in molecular biology. Recently I was thinking about the topic of DNA expression after reading an article in Popular Mechanics related to cloning rhinoceros. It’s not a topic I’ve thought about a whole lot in a while, so when I was going through a refresher from some of my coursework I found some resources that compared DNA to computer programming.

I found that model of thinking overly simplistic and somewhat confusing given my experience in both fields.

Coding for Codons

It’s a common thing to hear when you’re first learning about the structure of DNA that it’s the “code” of all living things. They call it your genetic code for a reason.

The obvious analog is machine language, which is binary. DNA is quaternary, where each individual digit can be one of four values – A (adenine), T (tyrosine), G (guanine) or C (cytosine). Machine language is bundled into bytes of 8 individual values (or bits), represented usually in hexadecimal which looks something like 0x17 or 0xF2. DNA is bundled into codons of 3 individual values each and is simply represented as the three bases it’s made of – ATT or GCA for instance. Each byte can encode a chunk of data or a machine language instruction, whereas each codon encodes a single amino acid that can be chained together to create proteins – the largest structure in the biological example is the gene, which is a collection of codons required for a protein.

DNA Central Dogma

You have to admit, it appears superficially to be a great metaphor for how the human body works compared to a computer.

But it’s wrong.

External Factors Make a Big Difference

Looking at DNA from such a high level is misleading because the steps involved in gene expression interact with not only itself but also external factors. Let’s take a look at a few and clear up the misconception of DNA as code.

RNA Confounds Things

If you look at the picture above and compare it to the machine language metaphor, you’ll notice an intermediate step – mRNA. mRNA stands for messenger ribonucleic acid (as opposed to deoxyribonucleic acid). mRNA is single-stranded and is actually where the protein chain itself is synthesized from. Notice also the new base, U (uracil), that takes the place of the tyrosine base from DNA. This isn’t hugely important for the DNA vs. machine language analogy, as we can simply think of it as a transpilation or different data encoding across both mediums.

What’s interesting, though, is that in biological systems, everything interacts with other things.

In our model above we’ve stated that “data” flows from DNA -> codon -> amino acid, but this isn’t really true. DNA produces some sort of RNA, in this case mRNA, which is sent to a ribosome to be built into a chain of amino acids. But this isn’t necessarily true either, because DNA can produce other types of RNA, such as tRNA or transfer RNA which purely exists as glue “code” to facilitate mRNA to move to where it needs to go. Essentially, this DNA encodes something that doesn’t really produce “code” as it’s sort of there to build an intermediate structure that doesn’t have a clean analogy to our machine language model.

There’s even a case where DNA encodes a special type of RNA that modifies the expression of DNA that it came from! It turns out the human body is sort of editing its own source code as it go along here.

Here’s a bit about RNA from genome.gov.

And Epigenetics, Too

Epigenetics is the ability of an organism to modify the expression of specific genes based on all kinds of different factors. One we’ve already talked about is RNA.

Proteins can do the exact same thing. It could physically interfere with the DNA or it could chemically do so, but the point of both is to change the way the cellular machinery that encodes RNA interacts with the DNA. Confused yet? DNA is complicated!

To go back to the article I was reading that spawned this blog post, one of the considerations when cloning an animal is the epigenetics of the mother. For rhinoceros, the species in question was the northern white rhino, of which the last male has sadly died. The remaining two females are in poor health and likely cannot carry a fetus to term. Therefore, the northern white embryo is planned to be implanted into a southern white female – but as we discussed before, the genetic expression of the other species’ mother means that it will likely have an entirely different placental environment, and thus, have possibly major genetic differences compared to if it was born of a mother of the same species, even if the genetic “code” is identical!

As an aside, this is also a large problem I have with the Jurassic Park movies, as epigenetic differences between the cloned species and the surrogate mother would no doubt result in a subspecies of dinosaur in those movies, not an exact clone. And what the heck, they used frog DNA as their “gap-filler”??

Jurassic Park Mr. DNA

DNA Is Only Self-Reproducing To a Point

The cellular machinery I talked about above (called DNA polymerases) aren’t 100% efficient when each cell divides. That means it’s error-prone, and, even worse, it can only do it a finite number of times! It’s almost like we git cloned a repository and we got a handful of random characters changed and it truncated the last 20 characters.

The fact that DNA polymerases are error-prone should be no surprise to you. This is one way we get DNA mutations, genetic variation, and, yes, cancer.

There’s also the fact that DNA can only be replicated a limited number of times. Each strand of DNA has a strand of repeating DNA base pairs called telomeres that stick off the end of the DNA strand. DNA polymerases can copy the DNA strand only up to a few bases at the end, so a small part gets “chopped” off. Over time, these telomeres get shorter and shorter until it encroaches on parts of the DNA that are important and encode critical RNA and therefore hamper protein synthesis.

Here’s an article related to telomeric shortening and aging/cancer.

Proteins Are Analog, Not Digital

But barring all that, once we make a protein, we’re all in the clear, right? The system should behave identically every time, surely!

Wrong.

There’s myriad ways that proteins can do different things, but let’s cover two that I think are interesting.

Prions

First up is prions. All proteins in order to be effective must be “folded” in a certain way. Misfolded proteins at best do nothing, and at worst, cause serious diseases. A protein that has misfolded can also cause other proteins to misfold – at which point is designated a prion. The DNA will keep doing its thing, transcribing mRNA to be synthesized into a protein, but all the prions floating around will convert the new proto-protein into a prion (almost like zombies!).

An example of diseases caused by prions are Creutzfeldt-Jakob disease (commonly mis-referred to as “mad cow” disease) and Alzheimer’s disease (caused by amyloid beta and tau proteins).

Analog Expression

Let me gush just a minute about my two favorite proteins of all time – sonic hedgehog and robotnikinin (yes, these are their real names).

Both of these proteins are heavily involved in central nervous system development and are considered essential proteins for normal development in vertebrates. They are expressed on a gradient (meaning in some places of the body they are expressed in no- or low-level amounts vs high-level amounts elsewhere). On top of that, sonic hedgehog is inhibited by robotnikinin and as such their presence is a complicated balance of the two. It is precisely this gradient that allows the developing nervous system to “orient” itself in space and know where the “front” of the brain should be, for instance.

This is incredibly interesting because this indicates a complex interaction between gene expression (to create sonic hedgehog) which then signals to the cells to grow, which further influences which genes should be expressed in the future.

Nature Versus Nurture

I also have a minor in psychology so I can’t resist throwing this in here as well.

Even if everything I’ve already talked about behaves identically between two organisms, we still have to consider its environment, upbringing, and other factors. In psychology, this is the classic “nature versus nurture”, but it applies just as well here. Even physical environment differences can cause gene expression changes! To go back to the rhino story, there is a strong push for a rhino calf to be cloned before the two remaining females die because it will need to learn and gain experience socially (but also undoubtedly at other lower, more unseen levels) from its own species.

Wrap Up

I hope you can see that while considering DNA “code” can be useful to understand how it functions at the most basic level, this really breaks down if you give it any serious consideration. Living things are wondrously complicated biological machines that interact with itself and others in completely unexpected ways, which makes the science around them incredibly complicated.

DNA no doubt can be considered code in that it can be read as a simple quaternary language, but there is so much more going on under the hood.

That’s all for now. Thanks for reading!

miscellaneous (5)

/> ty porter