On the Age of Humanity

Introduction

My research shows unequivocally, that archaic humans are still alive today, in that many living humans carry archaic mtDNA. The obvious question is, how did archaic humans survive for so long? The answer is, they probably didn’t, but their mtDNA did, just like the widely accepted fact that many living humans carry archaic DNA generally. What makes mtDNA unique, is that it is so stable, passed from a mother to its offspring, with basically no mutations at all, even over thousands of years. One estimate claims that one mutation occurs roughly every 7,990 years, though this estimate is qualified and plainly subject to doubt. I show below that assuming this is correct, Denisovan mtDNA existed about 38,000,000 years ago.

This is obviously way earlier than anyone thinks, but it’s not totally absurd, especially in light of relatively recent finds, including Graecopithecus, which was dated to 7.2 million years ago, in Greece, not Africa, which of course implies it’s possible the species emerged much earlier in Africa itself. Also note that we’re only discussing mtDNA, not the full genome. As a result, the claim is limited to the existence of Denisovan mtDNA, not the full genome. The discussion below of course considers the case that the estimate of 7,990 years per mutation is simply wrong, which is arguably the point of this note. Specifically, not all systems have stable averages over time, and a system as complex as the human genome of course might not behave in a predictable, stable manner.

Alignment, Insertions, and Deletions

Assume you have two copies of the exact same genome, and call them A and B. Note that mtDNA is N = 16,579 bases long, and as a result, the match count between genomes A and B is 16,579 bases, or 100% of the genome. Now insert a random base in genome B, at index 2. This will shift every base after the first index in B, by 1 position. This should cause the remaining N-1 bases to match to genome A about 25% of the time. That is, because we’ve shifted one of the otherwise identical genomes by one base, whatever bases that happen to match post insertion, should be the result of chance, and because there are four possible bases, the probability of a match is 1/4. Note that a deletion will cause an analogous reduction to chance. As a result, a single insertion or deletion will cause the match count to drop to around chance, after the index of the insertion or deletion.

The work I present in, “A New Model of Computational Genomics” [1], makes use of a global alignment, which means that when comparing two genomes, you assign each base an index, and the comparisons are made by testing whether the bases are equal at each index. The match count is simply the total number of matching bases. See [1] generally. In contrast, local alignments take segments from a given genome A (e.g., bases 1 through 100), and attempt to find the highest match count anywhere in genome B (e.g., bases 100 through 200). This would therefore, ignore insertions and deletions, since e.g., in the example above, a local alignment would search all of genome A for the best match, which would produce a match count of N (i.e., 100% of the genome), with one “gap” to account for the insertion. In contrast, a global alignment (i.e., just counting matching corresponding bases) would produce a match count of 1 + approximately 0.25*(N-1) (i.e., the first matching base, plus approximately 25% of the remaining N-1 bases).

Insertions and deletions are, at least anecdotally, very impactful in terms of the affect they have, since, e.g., Williams Syndrome, Down Syndrome, and many others, are caused by insertions and deletions. As a result, it’s not surprising that local alignments don’t seem terribly useful in terms of predictive power, because they effectively ignore insertions and deletions, creating very high match counts across all human mtDNA. In contrast, the software in [1], makes use a global alignment, which ultimately allows ethnicity to be predicted with approximately 80% accuracy.

Application to Data

As noted in [1], and many other research notes I’ve written, there are plenty of modern living humans with archaic mtDNA, in particular, Denisovan mtDNA. Denisovans test as the common ancestor of all archaic humans, suggesting that they are in fact the first humans. Though technically the modern people of Cameroon test as the ancestors of the Denisovans, which is again possible because mtDNA is so stable, I’ll work instead with the actual Denisovan genomes in my dataset, which were all taken from the NIH database.  The goal of this section is to approximate the date of the first Denisovans, given the genomes of modern living humans that carry Denisovan mtDNA, and the actual Denisovan genomes recovered from Siberia. There are 8 such Denisovan genomes in the dataset, out of a total of 664 genomes. All genomes are complete mtDNA genomes, again taken from the NIH database.

If we fix a minimum match threshold of 50% of the genome, we find that 82 non-Denisovan genomes are at least a 50% match to at least one Denisovan genome. These are living, modern humans that carry Denisovan mtDNA. The average match count over all such genomes is 11,779.32 bases, or 71.05% of the full genome. This means that since the Denisovan cave, 100% – 71.05% = 28.95% of the genome has mutated. This is 4,799.62 bases.

Though the rate at which mtDNA mutates is still a subject of discussion, as noted above, one cited figure is one mutation per 7,990 years. This would put the age of the Siberian Denisovans at 38,348,963.80 years before the present. This is way out of the ballpark for the low-end of what I’ve seen regarding the dates of these finds, which is around 300,000 years ago. As noted above, it’s at least possible that the modern living Denisovans instead carry the mtDNA of the ancestors of the Siberian Denisovans, which would again force us to reject the date of 38,348,963.80 years before the present. However, the data suggests this is not the case. See Section 6 of [1] generally.

It could also be the case that a single insertion or deletion is causing the match count to drop to around 70% of the genome when comparing the Siberian Denisovans to modern living humans. That is, there’s a single insertion of deletion further down the genome that causes the balance of the genome match count to drop to around 70%. This would not require that much time, since it is technically a single mutation. We can however rule this out by looking at the distribution of the matching bases along the genome. This can be done by grouping sequential bases (i.e., bases 1 through K, K+1 through 2K, etc), and then counting the percentage of matching bases in those segments. If the matching percentage of bases in each segment is always significantly above 25%, then it simply cannot be the case that the resultant match count is due to a single insertion or deletion within a given segment. The chart below shows the average percentage of matching bases for all 8 of the Siberian Denisovan genomes when compared to all other genomes that have at least a 50% match, breaking the full genome into 100 segments of 165 bases each.

You can plainly see that it’s not the result of a single insertion or deletion, since the match count is always above 40% of the bases in each segment. That said, there is still plainly a portion of the genome from around segment 5 to segment 40, that seems to have been impacted by insertions and deletions, but this is distinct from a single trivial insertion or deletion. As a result, we have an enormous amount of change to account for when comparing Siberian Denisovan mtDNA to the mtDNA carried by some modern, living humans. This again implies that either the estimated rate of mutation is wrong (probably correct) or the dates associated with the Siberian cave are way off (not as convincing). The software for this is below, and the balance of the software can be found in [1].

Leave a comment