Testing Ancestry Algorithmically

In my paper, “A New Model of Computational Genomics” [1], I presented an algorithm that allows you to test for ancestry given three genomes A, B, and C. In short, if genomes B and C descend from genome A, then genomes A and B, and genomes A and C, should have more bases in common than genomes B and C. You can read [1] to see why, but it’s mathematically impossible for B and C to have anything meaningfully more than chance in common with each other, since they both start the same (i.e., identical to genome A), and then evolve independently.

In [1], I provide the code to implement this algorithm, but tonight I wrote a really fun algorithm that finds the best root population among entire populations A, B, and C. That is, it tests, genome by genome, whether a given combination of three genomes from populations A, B, and C (one from each population), satisfies the test stated above. By doing this repeatedly, it can report back the root population with the highest percentage of satisfied tests.

I’ve noted in the past (including in [1]) that it’s obvious the Northern Europeans are closely related to the Ancient Egyptians. Specifically, it looks like they descend from the Ancient Egyptians. More recently, I’ve noticed that a lot of people globally are related to Northern Europeans. Applying the algorithm I wrote tonight, it looks like the flow is from West to East, in that e.g., when you ask whether South Koreans are the ancestors of the Norwegians and Germans, you get a low metric. In contrast, when you ask whether the Norwegians are the ancestors of the South Koreans and Germans you get a significantly higher metric. This requires a lot more testing, but it could explain why e.g., South East Asians are literally white, in the genetic sense.

Overall, my view now is that human life began in Africa, at some point turning into Denisovans, which in turn produced Neanderthals, which in turn produced modern humans. Heidelbergensis also seems to flow from Denisovans, but Heidelbergensis does not seem to be the ancestor of modern humans. You can test all of this using the code below. Really interesting, the people of Cameroon are significantly Denisovan (and so are the Finns, Danes, and Jews). In some cases, the people of Cameroon test as the ancestors of Asian Denisovans found in a cave, dated to about 50,000 years ago. This suggests at least the possibility that the people of Cameroon are the real thing, our closest link to our original ancestors, that began in Africa, moved to Asia, moved back to Africa and Europe (in particular Ancient Egypt), and then spread all over the world, including back to Asia, and South East Asia in particular. I suppose by the time they made that last journey to South East Asia, they were already white. It’s a crazy story, and it’s simply incredible that mathematics allows you to deduce all of this from just mtDNA. Note that it’s an ordinal test only, so you can’t say how long these transitions took, but you can say that one genome is the ancestor of two others.

Here’s the code, enjoy! Any missing code (and the dataset) can be found in [1].

Leave a comment