I’ve noted before that mtDNA must provide information about the paternal line, since I’ve written software that can predict ethnicity with about 80% accuracy, without any filtering for confidence, using mtDNA alone. See, A New Model of Computational Genomics , generally. Because ethnicity is a combination of both paternal and maternal ethnicity, there’s just no argument to the contrary – the accuracy would otherwise be horrible. I’ve developed reasonable hypotheses to explain this, specifically, the selection of particular maternal lines is probably a decent explanation for the fact that mtDNA must carry information about paternal ethnicity. That is, males in a given geography prefer particular females, for whatever reason, and that produces a unique distribution of maternal lines, which in turn, identifies the paternal lines.
However, some of my results suggest more direct influence from the paternal line. Specifically, it seems at least plausible that males select females that have mtDNA bases in common with them, which would over many generations cause the two maternal lines to fuse into one. For example, a Norwegian individual, when selecting among mates in Sweden, will select the mate that has the maximum number of mtDNA bases in common. This behavior would, over time, cause both Norwegian and Swedish mtDNA to combine, since each generation would mate on the basis of the maximum number of bases in common. This is course a random example, but I saw some evidence of this in the Danes, who seemed to be a mix between Swedes and Norwegians.
I’ve developed an experiment and software to test this hypothesis. Specifically, some populations are mixes between modern and archaic humans, and I’ve tested whether the introduction of archaic mtDNA impacts the modern mtDNA of the population in question. The experiment I’ve come up with is to test which Mongolians are at least a 60% match to Denisovans. There are 19 complete Mongolian genomes in the dataset, 8 Denisovan genomes, and 1 Heidelbergensis genome. All genomes are complete mtDNA genomes taken from the NIH Database, complete with provenance files for each genome linking to the genome descriptions. This gives each of the Mongolians genomes 8 chances to match with a Denisovan, and if a single match occurs, it is included in a list of genomes that are treated as in essence, Denisovan. Of the 19 Mongolian genomes, 4 were a match to at least 1 Denisovan. This leaves 15 genomes that did not match. The question is then, do the remaining 15 genomes have more in common with the Denisovans than a population that has no clear relationship to the Denisovans?
This is superficially impossible, because mtDNA is inherited directly from the mother to the child, typically with no mutations at all. However, my hypothesis is that males select females on the basis of genetic similarity. Specifically, that males attempt to maximize the number of bases in common with their female mate. This will, after generations, cause the mtDNA of the paternal line to converge with the mtDNA of the maternal line. Specifically for this experiment, it should be the case that the non-Denisovan Mongolian genomes have more bases in common with Denisovans than some other population that has no clear relationship to Denisovans. As a reference population with no clear relationship to either Denisovan or Heidelbergensis, I selected the English, and there are 9 English genomes in the dataset. The results suggest that I’m correct, since the average match count between a non-Denisovan Mongolian genome and the Denisovans is 4,957.9 bases, whereas the average match count between the English and the Denisovans is 4,673.2 bases. Applying the same methods to Heidelbergensis, we have 5,003.6 matching bases for the non-Heidelbergensis Mongolians, and 4767.4 bases for the English. The same is true of the Ashkenazi Jews, Kenyans, and Finns, all of whom have a similarly close relationship to the Denisovans. All of this is plainly consistent with the hypothesis that selection can alter mtDNA, specifically, selection by the paternal line.
Attached is the code and the dataset. Any missing code can be found in .