Eupedia Forums
Site NavigationEupedia Top > Eupedia Forum & Japan Forum
Results 1 to 7 of 7

Thread: How many samples are statistically significant ?

  1. #1
    JAK2
    Join Date
    25-11-10
    Location
    LYON
    Age
    81
    Posts
    8

    Y-DNA haplogroup
    G
    MtDNA haplogroup
    H1

    Ethnic group
    EASTERN EUROPE
    Country: France



    Unhappy How many samples are statistically significant ?

    I am new comer in this topic and as a former MD, I am not really convinced by 400 to 2400 samples as a reference to a true Result ...
    What does mean in such a complex topic as populations genetics,a result for a country of 80 millions, obtained with 2450 samples...; as I just read about Germany on the Forum s results tables...(!) and much less:until a very few hundreds; for many other places and groups...
    I have read some critics about many attitudes considered as political Agenda and abuse of proportions (considering as a whole definitive majority, results concerning only 45% of an ethnic group.....What about all other people which don't enter in the Box??)
    Quick Progresses show that some peremptory results are mostly controversial...
    Who can give me more informations in that sense of questionning?
    Thanks a lot
    Warmest regards

  2. #2
    Satyavrata Maciamo's Avatar
    Join Date
    17-07-02
    Location
    Lothier
    Posts
    9,554


    Ethnic group
    Italo-celto-germanic
    Country: Belgium - Brussels



    Sorry for the very delayed reply. I sometimes miss some threads.

    The number of samples required to obtain statistically significant results depends chiefly on two factors :

    1) the size of the population tested. Obviously the sample size for Luxembourg can be smaller than for Germany, and Germany smaller than China.

    2) the heterogeneity of the population tested. Some modern populations have grown very fast over the course of the last few centuries, while others have grown more steadily over the ages. I pointed out in a thread 3 years ago that in the early 19th century, Belgium was twice more populous than the Netherlands, while today the latter has a population 60% bigger than Belgium. Within Belgium, Wallonia use to be more populous than Flanders a few centuries ago. Flanders is now nearly twice as populous due to a much faster growth in the 20th century.

    In 1350, France had a population of 20 million, only three times less than now. If we deduct all the people with foreign surnames in France (immigration of the last few centuries), we see that the French population has only grown 2.5 folds in the last 750 years, which is very little. In comparison, in 1350 Britain had a population of roughly 4 million (3m in England), Poland 2 million and Russia 8 million. These countries' populations have grown approximately 15 to 20 folds. Italy had 10 million and Spain 7 million - each experienced about a 6 fold increase.

    So it's only natural that the genetic diversity should be higher in countries like France, Belgium, Italy and Spain than in northern or eastern Europe. In fact, the size of the historical population since the Middle Ages is fairly well reflected by the diversity of surnames. Italy, France and Belgium have the highest number of surnames per capita in Europe, while Scandinavia, the British Isles and most Slavic countries have among the lowest.

    The second factor is the most important, yet also the most overlooked.

    So what is the minimum sample size necessary to be relevant ? In northern and eastern Europe, where the medieval population density was much lower than in the former Roman Empire, I would say that 50 samples per million inhabitant (now) already gives a pretty good idea. This means 3000 samples for Britain, 2000 for Poland, or 250 for Denmark or Finland. Countries like Ireland clearly have more than enough Y-DNA samples to have a quite accurate picture. For countries like France, Belgium, Italy or Greece, 250 samples per million inhabitant are necessary, and they need to be selected carefully to cover every region, as there are often major disparities even in small adjacent regions (e.g. Cantabria vs Basque country, or Crete vs Peloponese, or Auvergne vs Rhône-Alpes). In other words, Belgium and Greece would need 2500 samples, France and Italy 15,000 samples.

    Spain and Portugal are a bit different because a large part of the medieval (Muslim and Jewish) population was expelled in the 15th century, and the modern population therefore grew from a smaller portion of the medieval population, which explains why the surname diversity is also lower. I would place them in an intermediary category, along with Germany, and estimate that 100 samples per million inhabitant is representative enough (so 1000 samples for Portugal, 4000 for Spain, and 8000 for Germany).
    Last edited by Maciamo; 03-10-11 at 13:14.
    My book selection---Follow me on Facebook and Twitter --- My profile on Academia.edu and on ResearchGate ----Check Wa-pedia's Japan Guide
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    "What is the use of living, if it be not to strive for noble causes and to make this muddled world a better place for those who will live in it after we are gone?", Winston Churchill.

  3. #3
    Regular Member MarTyro's Avatar
    Join Date
    06-08-11
    Posts
    41


    Country: Austria



    Quote Originally Posted by Maciamo View Post
    minimum sample size necessary to be relevant
    France and Italy 15,000 samples
    8000 for Germany
    4000 for Spain
    3000 samples for Britain
    Belgium and Greece 2500 samples
    2000 for Poland
    1000 samples for Portugal
    250 for Denmark or Finland
    Interesting. I guess France, Italy and also Germany and Spain have not reached that size by far. I would also add Switzerland and Austria as important alpine refugiums, the Balkan as important old melting pot and Hungary/Czechia/Slovakia as indicators of some immigration. Also the haplogroup-definition/nomenclature to me seems important: older studies can make problems. So we must expect some news (subclades, hg-enclaves, etc.). There should be build a central Haplogroup-distribution-database (maybe an EU-Project?), where every scientist and interested researcher can make his calculations; but there is none?
    Last edited by MarTyro; 05-10-11 at 07:42. Reason: regroup

  4. #4
    Junior Member realdealt's Avatar
    Join Date
    08-06-12
    Posts
    8

    Y-DNA haplogroup
    R-DF81+
    MtDNA haplogroup
    I2

    Ethnic group
    Mexican American
    Country: USA - Texas



    So you are saying 1000 sample size for Portugal and 4000 for Spain......hmmm.....I have a map of R1b-M269 distribution at IberianRoots based on 1608 for Portugal and 2032 for Spain (total 3640 across Iberia). I believe another key in sampling bias depends on the geographic distribution of the samples taken and not just on heterogeneity (as I understand it). The samples have heterogeneity because of all the other haplogroups they are grouped with but they also have to have a fairly good geographic distribution for the map to be "accurate".

    What about if I have 11000 samples from lineages originating out of Iberia but now scattered across the Americas primarily? Wouldn't this be regarded as a decent un-biased sample set?

  5. #5
    Regular Member
    Join Date
    17-12-11
    Location
    Sofia
    Posts
    339

    Y-DNA haplogroup
    J-L70
    MtDNA haplogroup
    H2a2a1

    Ethnic group
    Bulgarian
    Country: Bulgaria



    Quote Originally Posted by JAK2 View Post
    I am new comer in this topic and as a former MD, I am not really convinced by 400 to 2400 samples as a reference to a true Result ...
    What does mean in such a complex topic as populations genetics,a result for a country of 80 millions, obtained with 2450 samples...; as I just read about Germany on the Forum s results tables...(!) and much less:until a very few hundreds; for many other places and groups...
    I have read some critics about many attitudes considered as political Agenda and abuse of proportions (considering as a whole definitive majority, results concerning only 45% of an ethnic group.....What about all other people which don't enter in the Box??)
    Quick Progresses show that some peremptory results are mostly controversial...
    Who can give me more informations in that sense of questionning?
    Thanks a lot
    Warmest regards
    Everything bellow 500 is a joke. So at least 500 :)

  6. #6
    JAK2
    Join Date
    25-11-10
    Location
    LYON
    Age
    81
    Posts
    8

    Y-DNA haplogroup
    G
    MtDNA haplogroup
    H1

    Ethnic group
    EASTERN EUROPE
    Country: France



    Red face How many samples are necessary...

    Quote Originally Posted by realdealt View Post
    So you are saying 1000 sample size for Portugal and 4000 for Spain......hmmm.....I have a map of R1b-M269 distribution at IberianRoots based on 1608 for Portugal and 2032 for Spain (total 3640 across Iberia). I believe another key in sampling bias depends on the geographic distribution of the samples taken and not just on heterogeneity (as I understand it). The samples have heterogeneity because of all the other haplogroups they are grouped with but they also have to have a fairly good geographic distribution for the map to be "accurate".

    What about if I have 11000 samples from lineages originating out of Iberia but now scattered across the Americas primarily? Wouldn't this be regarded as a decent un-biased sample set?

    I thank you all for your very interesting answers which confirm my ideas about the lack of serious of most Results in the Topic...
    "Grand Ma 's Tales" as say the Jews for fanciful stories...!
    Jak2

  7. #7
    Elite member
    Join Date
    25-10-11
    Location
    Brittany
    Age
    72
    Posts
    4,969

    Y-DNA haplogroup
    R1b - L21/S145*
    MtDNA haplogroup
    H3c

    Ethnic group
    more celtic
    Country: France



    I agree the samples are for now a bit tiny for so big countries but yet we have some sketches that do not seem the pure effect of hazard when we know the history...
    somes regions in Europe are pretty good sampled, someones (I think in France principally) are veryscarce -
    with Maciamo we had the opportunity to see some firstable surprising %s becoming less dubious (I think by instance in Denmark, Iceland, Norway and Sweden) - we saw too some surprising new results about Italy showing we need more data
    as a whole, the very dominant HGs do not show big variations - but sample size is very important for minor HGs - (I did mistakes or unbased theories on tiny numbers yet, but it was bets! and it helps to maintain my brain at work) -
    for autosomals I think we don't need so big samples

    have a good evening

Similar Threads

  1. European Authors or Books that were significant to you?
    By Sirius2b in forum Literature & Theatre
    Replies: 33
    Last Post: 01-02-21, 20:41
  2. old DNA samples (Y-DNA/mtDNA) 500 BC to 1500 AD
    By MarTyro in forum Paleogenetics
    Replies: 4
    Last Post: 17-02-12, 15:47

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •