On Norwegian Ancestry

I wrote a short script (attached below) that allows you to quickly compare the distribution of ethnicities associated with a given ethnicity. Out of curiosity, I applied it to Norwegians and Swedes, and as I noted in, A New Model of Computational Genomics [1], they’re different people, that are of course nonetheless closely related. However, Norwegians are much closer to the people of the Pacific, specifically, the Thai. This is obvious when you look at Stave Churches, which are almost identical in structure and aesthetic to Thai temples, and moreover, don’t look anything like a normal church. On the left is a Norwegian Stave Church, and on the right, is a Thai temple, both courtesy of Wikipedia.

In fact, it turns out the distribution of Stave Churches is concentrated almost exclusively in Norway, at least according to Google. There are others elsewhere in Europe, but it seems the Stave Churches in Scandinavia are generally limited to Norway. The map below is obviously courtesy of Google, and you can generate it yourself by simply typing in, “Stave Churches near Scandinavia”.


If you actually compare the distribution of associated ethnicities between Norwegians and Swedes, you get the chart below, which plainly shows that the Norwegians are much closer to the people of the Pacific, indigenous peoples generally, and some Africans. Specifically, TH stands for Thai (obviously in the Pacific), SI stands for the Solomon Islands (islands in the Pacific), SQ stands for the Saqqaq (indigenous people of Greenland), SM stands for Sami (indigenous peoples of Scandinavia and Russia), NG stands stands for Nigeria, and KH stands for Khoisan (a people in Southern Africa). The complete list of acronyms can be found at the end of [1]. The chart below shows a normalized rank for the Norwegians (i.e., a scale from 0 to 1), minus that same rank for the Swedes. This causes the values to range from 1 to -1. The rank for a given column is the normalized number of matches at 90% of a given genome, and all genomes are complete mtDNA genomes taken from the NIH database. That is, the algorithm first counts the number of Norwegians that are e.g., a 90% match to the Nigerians, and then normalizes that number from 0 to 1, with 0 being none of them, 1 being all of them. Then, that same data is produced for the Swedes, and the chart below shows the difference between the two. Informally, this is Norway minus Sweden, and so, e.g., column 1 shows that the Swedes are closer to the Kazakhs than the Norwegians (note that KZ stands for Kazakh), since it is a negative number.

All of the genomes in the dataset can be found in [1], and come from the NIH Database. Moreover, all of the genomes have been diligenced to ensure that the ethnicity classifier is in fact the ethnicity of the person in question. So e.g., if a genome is classified as Norwegian in my dataset, then the notes associated with the genome either explicitly state that the person is Norwegian, or plainly indicate that the person is Norwegian (as opposed to a Swede living in Norway). The dataset contains a link to the NIH Database for every genome, where you can review the notes yourself.

Here’s the code, whereas the dataset (and any missing code), is linked to in [1].



