I noticed that some Japanese people seem to have a very low number of bases in common with not only the world, but each other. The dataset I’m using consists of 185 complete genomes, from 19 nationalities, and 3 ancient species, all taken from the NIH Database.
For 2 of the 10 Japanese complete genomes, the maximum number of matching bases anywhere in the world is about 5,000 matching bases. The complete genome has a size of 16,579 bases, and so this is not much better than chance, given by 16,579/4 = 4.145, suggesting that it really is just the operation of chance causing any intersection at all between those Japanese genomes and the global population generally.
This view finds further support in the fact that the entire global population has a perfectly consistent genome (i.e., no variation at all) over the first 15 bases. The probability of this being chance is 1/4190, which is so small, it’s zero in MATLAB. That is, the sequence has a length of 15, and it is common to 175 genomes.
Note this dataset includes 3 complete ancient genomes, specifically, Denisovan, Maritime Archaic, and Homo heidelbergensis, all of which also contain exactly the same globally common sequence. Homo heidelbergensis is thought to have gone extinct hundreds of thousands of years ago, suggesting there is basically zero variation in the opening prefix to human mtDNA.
Said otherwise, globally, there is no mutation at all over the first 15 bases of the human mtDNA genome, anywhere in known history.
This is not true when you include Japan, and in fact, only 1 genome out of 10 is a perfect match, and therefore consistent with the global genome. Instead, the average number of matches excluding that one individual, is 3.2, over the opening prefix of 15 bases.
Putting it all together, you have a global match count for 2 out of 10 Japanese people that seems to be the result of pure chance, and 9 out of 10 Japanese people have a prefix segment that is almost entirely inconsistent with a globally and historically uniform segment of mtDNA.
Has anyone noticed this before or heard other people discussing it? I think it’s consistent with one of two hypotheses:
- Japanese mtDNA has a much higher rate of mutation than typical mtDNA, for whatever reason. We could test for this by looking at the rate of change from one generation to the next.
- Japanese mtDNA descends from a totally different bacteria.
- There was an event that caused a drastic mutation to Japanese mtDNA, and then natural selection took over, and so nothing much changed, since as far as I know, the Japanese have no drastically higher rates of diseases connected to mtDNA, and in fact they have good health outcomes overall.
If either 1 or 3 are true, then it suggests that DNA could have an error correcting function, since single base variants often produce disease, yet here we have drastically inconsistent mtDNA, that doesn’t seem to have any notable problems at all. Note that natural selection would certainly kill off bad outcomes, but it doesn’t produce good outcomes. And so this particular case is at least consistent with the idea that DNA can adjust mutated sequences to avoid malfunction and disease.
In any case, this is highly unusual, since mtDNA is consistent for generations, and in some cases over possibly hundreds of thousands of years. I’ll add the caveat that it could be bad data, despite being from a reputable source, and the opening prefix being inconsistent is perhaps evidence of this.
Here’s the dataset with a ton of code you can use to analyze the data, and here’s the search string for the raw data from the NIH Database.