Estonian genomes reveal the origins of a major ancestry component of the Finnish


Regular Member
Reaction score
Patterns of genetic connectedness between modern and medieval Estonian genomes reveal the origins of a major ancestry component of the Finnish population

The Finnish population is a unique example of a genetic isolate affected by a recent founder event. Previous studies have suggested that the ancestors of Finnic-speaking Finns and Estonians reached the circum-Baltic region by the 1st millennium BC. However, high linguistic similarity points to a more recent split of their languages. To study genetic connectedness between Finns and Estonians directly, we first assessed the efficacy of imputation of low-coverage ancient genomes by sequencing a medieval Estonian genome to high depth (233) and evaluated the performance of its down-sampled replicas. We find that ancient genomes imputed from >0.13 coverage can be reliably used in principal-component analyses without projection. By searching for long shared allele intervals (LSAIs; similar to identity-by-descent segments) in unphased data for >143,000 present-day Estonians, 99 Finns, and 14 imputed ancient genomes from Estonia, we find unexpectedly high levels of individual connectedness between Estonians and Finns for the last eight centuries in contrast to their clear differentiation by allele frequencies. High levels of sharing of these segments between Estonians and Finns predate the demographic expansion and late settlement process of Finland. One plausible source of this extensive sharing is the 8th–10th centuries AD migration event from North Estonia to Finland that has been proposed to explain uniquely shared linguistic features between the Finnish language and the northern dialect of Estonian and shared Christianity-related loanwords from Slavic. These results suggest that LSAI detection provides a computationally tractable way to detect fine-scale structure in large cohorts.
From the paper:

"Kinship coefficients are a measure of the proportion of genome-wide IBD in a pair of individuals. The abundance of long IBD segments can be a robust indicator of close relatedness in a large unstructured population and is therefore widely used by direct-to-consumer genetic testing for inferring matches in genealogical relationships (up to 5th cousins). However, our simulations (Figures S5 and S6) show that IBD sharing patterns are strongly influenced by effective population size history. The kinship coefficients estimated here between modern and medieval samples are clearly not interpretable in terms of meaningful genealogical relationships given that the pairs are separated by more than 15 generations in time. Hence, the signals of elevated LSAI sharing (comparable in their intensity to the levels of 4th- to 5th-cousin relationships in large populations) between present-day individuals and those sampled 10–20 generations ago can be best explained, in line with our simulation results, by relatively low historic Ne and recent exponential growth in Estonia. Additionally, a large fraction of the segments we detect may not correspond to shared haplotypes because of their unphased nature and their length, and as such, represent a series of very short segments that coalesce, on average, longer ago than a true IBD segment of the same length would. Consistent with this, among the triangular cases, where a medieval genome shares LSAI with two modern individuals who are themselves closely related (grandparent-parent- offspring sequence), we observe no excess of LSAI sharing in the grandparent (Figure S8), which would be expected under genealogical relationships. Thus, it is more likely that most cases of diachronic LSAI sharing that we describe are explainable by cumulative long-term maintenance of community-specific chunks of IBD through marriages involving distant (cryptic) relatedness within the same parish- or county-level community."

This seems to apply to many peasant groups that have inhabited a small region for centuries.

This thread has been viewed 1891 times.